Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI

2018-06-15 Thread Ricardo Neri
On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> > > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > > @@ -183,6 +184,8 @@ static irqreturn_t 
> > > > hardlockup_detector_irq_handler(int irq, void *data)
> > > > if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > > > kick_timer(hdata);
> > > >  
> > > > +   pr_err("This interrupt should not have happened. Ensure 
> > > > delivery mode is NMI.\n");
> > > 
> > > Eeew.
> > 
> > If you don't mind me asking. What is the problem with this error message?
> 
> The problem is not the error message. The problem is the abuse of
> request_irq() and the fact that this irq handler function exists in the
> first place for something which is NMI based.

I wanted to add this handler in case the interrupt was not configured correctly
to be delivered as NMI (e.g., not supported by the hardware). I see your point.
Perhaps this is not needed. There is code in place to complain when an interrupt
that nobody was expecting happens.

> 
> > > And in case that the HPET does not support periodic mode this reprogramms
> > > the timer on every NMI which means that while perf is running the watchdog
> > > will never ever detect anything.
> > 
> > Yes. I see that this is wrong. With MSI interrupts, as far as I can
> > see, there is not a way to make sure that the HPET timer caused the NMI
> > perhaps the only option is to use an IO APIC interrupt and read the
> > interrupt status register.
> > 
> > > Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> > > access is horribly slow, so any high frequency perf monitoring will take a
> > > massive performance hit.
> > 
> > If an IO APIC interrupt is used, only HPET register (the status register)
> > would need to be read for every NMI. Would that be more acceptable? 
> > Otherwise,
> > there is no way to determine if the HPET cause the NMI.
> 
> You need level trigger for the HPET status register to be useful at all
> because in edge mode the interrupt status bits read always 0.

Indeed.

> 
> That means you have to fiddle with the IOAPIC acknowledge magic from NMI
> context. Brilliant idea. If the NMI hits in the middle of a regular
> io_apic_read() then the interrupted code will endup with the wrong index
> register. Not to talk about the fun which the affinity rotation from NMI
> context would bring.
> 
> Do not even think about using IOAPIC and level for this.

OK. I will stay away of it and focus on MSI.
> 
> > Alternatively, there could be a counter that skips reading the HPET status
> > register (and the detection of hardlockups) for every X NMIs. This would
> > reduce the overall frequency of HPET register reads.
> 
> Great plan. So if the watchdog is the only NMI (because perf is off) then
> you delay the watchdog detection by that count.

OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
register per NMI just to check in the status register if the HPET timer
caused the NMI?

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI

2018-06-15 Thread Ricardo Neri
On Fri, Jun 15, 2018 at 09:01:02AM +0100, Julien Thierry wrote:
> Hi Ricardo,
> 
> On 15/06/18 03:12, Ricardo Neri wrote:
> >On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
> >>On 13/06/18 10:20, Thomas Gleixner wrote:
> >>>On Wed, 13 Jun 2018, Julien Thierry wrote:
> >>>>On 13/06/18 09:34, Peter Zijlstra wrote:
> >>>>>On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> >>>>>>diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> >>>>>>index 5426627..dbc5e02 100644
> >>>>>>--- a/include/linux/interrupt.h
> >>>>>>+++ b/include/linux/interrupt.h
> >>>>>>@@ -61,6 +61,8 @@
> >>>>>>*interrupt handler after suspending interrupts. For
> >>>>>>system
> >>>>>>*wakeup devices users need to implement wakeup
> >>>>>>detection in
> >>>>>>*their interrupt handlers.
> >>>>>>+ * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> >>>>>>non-maskable, if
> >>>>>>+ *supported by the chip.
> >>>>>>*/
> >>>>>
> >>>>>NAK on the first 6 patches. You really _REALLY_ don't want to expose
> >>>>>NMIs to this level.
> >>>>>
> >>>>
> >>>>I've been working on something similar on arm64 side, and effectively the 
> >>>>one
> >>>>thing that might be common to arm64 and intel is the interface to set an
> >>>>interrupt as NMI. So I guess it would be nice to agree on the right 
> >>>>approach
> >>>>for this.
> >>>>
> >>>>The way I did it was by introducing a new irq_state and let the irqchip 
> >>>>driver
> >>>>handle most of the work (if it supports that state):
> >>>>
> >>>>https://lkml.org/lkml/2018/5/25/181
> >>>>
> >>>>This has not been ACKed nor NAKed. So I am just asking whether this is a 
> >>>>more
> >>>>suitable approach, and if not, is there any suggestions on how to do this?
> >>>
> >>>I really didn't pay attention to that as it's burried in the GIC/ARM series
> >>>which is usually Marc's playground.
> >>
> >>I'm working my way through it ATM now that I have some brain cycles back.
> >>
> >>>Adding NMI delivery support at low level architecture irq chip level is
> >>>perfectly fine, but the exposure of that needs to be restricted very
> >>>much. Adding it to the generic interrupt control interfaces is not going to
> >>>happen. That's doomed to begin with and a complete abuse of the interface
> >>>as the handler can not ever be used for that.
> >>
> >>I can only agree with that. Allowing random driver to use request_irq()
> >>to make anything an NMI ultimately turns it into a complete mess ("hey,
> >>NMI is *faster*, let's use that"), and a potential source of horrible
> >>deadlocks.
> >>
> >>What I'd find more palatable is a way for an irqchip to be able to
> >>prioritize some interrupts based on a set of architecturally-defined
> >>requirements, and a separate NMI requesting/handling framework that is
> >>separate from the IRQ API, as the overall requirements are likely to
> >>completely different.
> >>
> >>It shouldn't have to be nearly as complex as the IRQ API, and require
> >>much stricter requirements in terms of what you can do there (flow
> >>handling should definitely be different).
> >
> >Marc, Julien, do you plan to actively work on this? Would you mind keeping
> >me in the loop? I also need this work for this watchdog. In the meantime,
> >I will go through Julien's patches and try to adapt it to my work.
> 
> We are going to work on this and of course your input is most welcome to
> make sure we have an interface usable across different architectures.

Great! Thanks! I will keep an eye to future version of your "arm64: provide
pseudo NMI with GICv3" series.
> 
> In my patches, I'm not sure there is much to adapt to your work as most of
> it is arch specific (although I wont say no to another pair of eyes looking
> at them). From what I've seen of your patches, the point where we converge
> is that need for some code to be able to tell the irqchip "I want that
> particular interrupt line to be treated/setup as an NMI".

Indeed, there has to be a generic way for the irqchip to announce that it
supports configuring an interrupt as NMI... and a way to actually configuring
it.

> 
> We'll make sure to keep you in the loop for discussions/suggestions on this.

Thank you!

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs

2018-06-15 Thread Ricardo Neri
On Fri, Jun 15, 2018 at 12:29:06PM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> > > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > > +   /* There are no CPUs to monitor. */
> > > > +   if (!cpumask_weight(>monitored_mask))
> > > > +   return NMI_HANDLED;
> > > > +
> > > > inspect_for_hardlockups(regs);
> > > >  
> > > > +   /*
> > > > +* Target a new CPU. Keep trying until we find a monitored CPU. 
> > > > CPUs
> > > > +* are addded and removed to this mask at cpu_up() and 
> > > > cpu_down(),
> > > > +* respectively. Thus, the interrupt should be able to be moved 
> > > > to
> > > > +* the next monitored CPU.
> > > > +*/
> > > > +   spin_lock(_data->lock);
> > > 
> > > Yuck. Taking a spinlock from NMI ...
> > 
> > I am sorry. I will look into other options for locking. Do you think 
> > rcu_lock
> > would help in this case? I need this locking because the CPUs being 
> > monitored
> > changes as CPUs come online and offline.
> 
> Sure, but you _cannot_ take any locks in NMI context which are also taken
> in !NMI context. And RCU will not help either. How so? The NMI can hit
> exactly before the CPU bit is cleared and then the CPU goes down. So RCU
> _cannot_ protect anything.
> 
> All you can do there is make sure that the TIMn_CONF is only ever accessed
> in !NMI code. Then you can stop the timer _before_ a CPU goes down and make
> sure that the eventually on the fly NMI is finished. After that you can
> fiddle with the CPU mask and restart the timer. Be aware that this is going
> to be more corner case handling that actual functionality.

Thanks for the suggestion. It makes sense to stop the timer when updating the
CPU mask. In this manner the timer will not cause any NMI.
> 
> > > > +   for_each_cpu_wrap(cpu, >monitored_mask, 
> > > > smp_processor_id() + 1) {
> > > > +   if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > > > +   break;
> > > 
> > > ... and then calling into generic interrupt code which will take even more
> > > locks is completely broken.
> > 
> > I will into reworking how the destination of the interrupt is set.
> 
> You have to consider two cases:
> 
>  1) !remapped mode:
> 
> That's reasonably simple because you just have to deal with the HPET
> TIMERn_PROCMSG_ROUT register. But then you need to do this directly and
> not through any of the existing interrupt facilities.

Indeed, there is no need to use the generic interrupt faciities to set affinity;
I am dealing with an NMI anyways.
> 
>  2) remapped mode:
> 
> That's way more complex as you _cannot_ ever do anything which touches
> the IOMMU and the related tables.
> 
> So you'd need to reserve an IOMMU remapping entry for each CPU upfront,
> store the resulting value for the HPET TIMERn_PROCMSG_ROUT register in
> per cpu storage and just modify that one from NMI.
> 
> Though there might be subtle side effects involved, which are related to
> the acknowledge part. You need to talk to the IOMMU wizards first.

I see. I will look into the code and prototype something that makes sense for
the IOMMU maintainers.

> 
> All in all, the idea itself is interesting, but the envisioned approach of
> round robin and no fast accessible NMI reason detection is going to create
> more problems than it solves.

I see it more clearly now.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 09/23] x86/hpet: Reserve timer for the HPET hardlockup detector

2018-06-12 Thread Ricardo Neri
HPET timer 2 will be used to drive the HPET-based hardlockup detector.
Reserve such timer to ensure it cannot be used by user space programs or
clock events.

When looking for MSI-capable timers for clock events, skip timer 2 if
the HPET hardlockup detector is selected.

Also, do not assign an IO APIC pin to timer 2 of the HPET. A subsequent
changeset will handle the interrupt setup of the timer used for the
hardlockup detector.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  3 +++
 arch/x86/kernel/hpet.c  | 19 ---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 9e0afde..3266796 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -61,6 +61,9 @@
  */
 #define HPET_MIN_PERIOD10UL
 
+/* Timer used for the hardlockup detector */
+#define HPET_WD_TIMER_NR 2
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3fa1d3f..b03faee 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -185,7 +185,8 @@ do {
\
 
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
- * timer 0 and timer 1 in case of RTC emulation.
+ * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
+ * the HPET-based hardlockup detector is used.
  */
 #ifdef CONFIG_HPET
 
@@ -195,7 +196,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
 {
struct hpet __iomem *hpet = hpet_virt_address;
struct hpet_timer __iomem *timer = >hpet_timers[2];
-   unsigned int nrtimers, i;
+   unsigned int nrtimers, i, start_timer;
struct hpet_data hd;
 
nrtimers = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT) + 1;
@@ -210,6 +211,13 @@ static void hpet_reserve_platform_timers(unsigned int id)
hpet_reserve_timer(, 1);
 #endif
 
+   if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET)) {
+   hpet_reserve_timer(, HPET_WD_TIMER_NR);
+   start_timer = HPET_WD_TIMER_NR + 1;
+   } else {
+   start_timer = HPET_WD_TIMER_NR;
+   }
+
/*
 * NOTE that hd_irq[] reflects IOAPIC input pins (LEGACY_8254
 * is wrong for i8259!) not the output IRQ.  Many BIOS writers
@@ -218,7 +226,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
hd.hd_irq[0] = HPET_LEGACY_8254;
hd.hd_irq[1] = HPET_LEGACY_RTC;
 
-   for (i = 2; i < nrtimers; timer++, i++) {
+   for (i = start_timer; i < nrtimers; timer++, i++) {
hd.hd_irq[i] = (readl(>hpet_config) &
Tn_INT_ROUTE_CNF_MASK) >> Tn_INT_ROUTE_CNF_SHIFT;
}
@@ -630,6 +638,11 @@ static void hpet_msi_capability_lookup(unsigned int 
start_timer)
struct hpet_dev *hdev = _devs[num_timers_used];
unsigned int cfg = hpet_readl(HPET_Tn_CFG(i));
 
+   /* Do not use timer reserved for the HPET watchdog. */
+   if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET) &&
+   i == HPET_WD_TIMER_NR)
+   continue;
+
/* Only consider HPET timer with MSI support */
if (!(cfg & HPET_TN_FSB_CAP))
continue;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 08/23] x86/hpet: Calculate ticks-per-second in a separate function

2018-06-12 Thread Ricardo Neri
It is easier to compute the expiration times of an HPET timer by using
its frequency (i.e., the number of times it ticks in a second) than its
period, as given in the capabilities register.

In addition to the HPET char driver, the HPET-based hardlockup detector
will also need to know the timer's frequency. Thus, create a common
function that both can use.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 drivers/char/hpet.c  | 31 +--
 include/linux/hpet.h |  1 +
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index be426eb..1c9584a 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -838,6 +838,29 @@ static unsigned long hpet_calibrate(struct hpets *hpetp)
return ret;
 }
 
+u64 hpet_get_ticks_per_sec(u64 hpet_caps)
+{
+   u64 ticks_per_sec, period;
+
+   period = (hpet_caps & HPET_COUNTER_CLK_PERIOD_MASK) >>
+HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
+
+   /*
+* The frequency is the reciprocal of the period. The period is given
+* femtoseconds per second. Thus, prepare a dividend to obtain the
+* frequency in ticks per second.
+*/
+
+   /* 10^15 femtoseconds per second */
+   ticks_per_sec = 1000uLL;
+   ticks_per_sec += period >> 1; /* round */
+
+   /* The quotient is put in the dividend. We drop the remainder. */
+   do_div(ticks_per_sec, period);
+
+   return ticks_per_sec;
+}
+
 int hpet_alloc(struct hpet_data *hdp)
 {
u64 cap, mcfg;
@@ -847,7 +870,6 @@ int hpet_alloc(struct hpet_data *hdp)
size_t siz;
struct hpet __iomem *hpet;
static struct hpets *last;
-   unsigned long period;
unsigned long long temp;
u32 remainder;
 
@@ -883,6 +905,8 @@ int hpet_alloc(struct hpet_data *hdp)
 
cap = readq(>hpet_cap);
 
+   temp = hpet_get_ticks_per_sec(cap);
+
ntimer = ((cap & HPET_NUM_TIM_CAP_MASK) >> HPET_NUM_TIM_CAP_SHIFT) + 1;
 
if (hpetp->hp_ntimer != ntimer) {
@@ -899,11 +923,6 @@ int hpet_alloc(struct hpet_data *hdp)
 
last = hpetp;
 
-   period = (cap & HPET_COUNTER_CLK_PERIOD_MASK) >>
-   HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
-   temp = 1000uLL; /* 10^15 femtoseconds per second */
-   temp += period >> 1; /* round */
-   do_div(temp, period);
hpetp->hp_tick_freq = temp; /* ticks per second */
 
printk(KERN_INFO "hpet%d: at MMIO 0x%lx, IRQ%s",
diff --git a/include/linux/hpet.h b/include/linux/hpet.h
index 8604564..e7b36bcf4 100644
--- a/include/linux/hpet.h
+++ b/include/linux/hpet.h
@@ -107,5 +107,6 @@ static inline void hpet_reserve_timer(struct hpet_data *hd, 
int timer)
 }
 
 int hpet_alloc(struct hpet_data *);
+u64 hpet_get_ticks_per_sec(u64 hpet_caps);
 
 #endif /* !__HPET__ */
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector

2018-06-12 Thread Ricardo Neri
This is the initial implementation of a hardlockup detector driven by an
HPET timer. This initial implementation includes functions to control
the timer via its registers. It also requests such timer, installs
a minimal interrupt handler and performs the initial configuration of
the timer.

The detector is not functional at this stage. Subsequent changesets will
populate the NMI watchdog operations and register it with the lockup
detector.

This detector depends on HPET_TIMER since platform code performs the
initialization of the timer and maps its registers to memory. It depends
on HPET to compute the ticks per second of the timer.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 kernel/Makefile|   1 +
 kernel/watchdog_hld_hpet.c | 334 +
 lib/Kconfig.debug  |  10 ++
 3 files changed, 345 insertions(+)
 create mode 100644 kernel/watchdog_hld_hpet.c

diff --git a/kernel/Makefile b/kernel/Makefile
index 0a0d86d..73c79b2 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
 obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o watchdog_hld_perf.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld.o watchdog_hld_hpet.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
new file mode 100644
index 000..8fa4e55
--- /dev/null
+++ b/kernel/watchdog_hld_hpet.c
@@ -0,0 +1,334 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A hardlockup detector driven by an HPET timer.
+ *
+ * Copyright (C) Intel Corporation 2018
+ */
+
+#define pr_fmt(fmt) "NMI hpet watchdog: " fmt
+
+#include 
+#include 
+#include 
+
+#undef pr_fmt
+#define pr_fmt(fmt) "NMI hpet watchdog: " fmt
+
+static struct hpet_hld_data *hld_data;
+
+/**
+ * get_count() - Get the current count of the HPET timer
+ *
+ * Returns:
+ *
+ * Value of the main counter of the HPET timer
+ */
+static inline unsigned long get_count(void)
+{
+   return hpet_readq(HPET_COUNTER);
+}
+
+/**
+ * set_comparator() - Update the comparator in an HPET timer instance
+ * @hdata: A data structure with the timer instance to update
+ * @cmp:   The value to write in the in the comparator registere
+ *
+ * Returns:
+ *
+ * None
+ */
+static inline void set_comparator(struct hpet_hld_data *hdata,
+ unsigned long cmp)
+{
+   hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
+}
+
+/**
+ * kick_timer() - Reprogram timer to expire in the future
+ * @hdata: A data structure with the timer instance to update
+ *
+ * Reprogram the timer to expire within watchdog_thresh seconds in the future.
+ *
+ * Returns:
+ *
+ * None
+ */
+static void kick_timer(struct hpet_hld_data *hdata)
+{
+   unsigned long new_compare, count;
+
+   /*
+* Update the comparator in increments of watch_thresh seconds relative
+* to the current count. Since watch_thresh is given in seconds, we
+* are able to update the comparator before the counter reaches such new
+* value.
+*
+* Let it wrap around if needed.
+*/
+   count = get_count();
+
+   new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+
+   set_comparator(hdata, new_compare);
+}
+
+/**
+ * disable() - Disable an HPET timer instance
+ * @hdata: A data structure with the timer instance to disable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void disable(struct hpet_hld_data *hdata)
+{
+   unsigned int v;
+
+   v = hpet_readl(HPET_Tn_CFG(hdata->num));
+   v &= ~HPET_TN_ENABLE;
+   hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * enable() - Enable an HPET timer instance
+ * @hdata: A data structure with the timer instance to enable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void enable(struct hpet_hld_data *hdata)
+{
+   unsigned long v;
+
+   /* Clear any previously active interrupt. */
+   hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+   v = hpet_readl(HPET_Tn_CFG(hdata->num));
+   v |= HPET_TN_ENABLE;
+

[RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf

2018-06-12 Thread Ricardo Neri
The current default implementation of the hardlockup detector assumes that
it is implemented using perf events. However, the hardlockup detector can
be driven by other sources of non-maskable interrupts (e.g., a properly
configured timer).

Put in a separate file all the code that is specific to perf: create and
manage events, stop and start the detector. This perf-specific code is put
in the new file watchdog_hld_perf.c

The code generic code used to monitor the timers' thresholds, check
timestamps and detect hardlockups remains in watchdog_hld.c

Functions and variables are simply relocated to a new file. No functional
changes were made.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: "Luis R. Rodriguez" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 kernel/Makefile|   2 +-
 kernel/watchdog_hld.c  | 162 
 kernel/watchdog_hld_perf.c | 182 +
 3 files changed, 183 insertions(+), 163 deletions(-)
 create mode 100644 kernel/watchdog_hld_perf.c

diff --git a/kernel/Makefile b/kernel/Makefile
index f85ae5d..0a0d86d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -85,7 +85,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o watchdog_hld_perf.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 28a00c3..96615a2 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,12 +22,8 @@
 
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
-static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
-static DEFINE_PER_CPU(struct perf_event *, dead_event);
-static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -98,14 +94,6 @@ static inline bool watchdog_check_timestamp(void)
 }
 #endif
 
-static struct perf_event_attr wd_hw_attr = {
-   .type   = PERF_TYPE_HARDWARE,
-   .config = PERF_COUNT_HW_CPU_CYCLES,
-   .size   = sizeof(struct perf_event_attr),
-   .pinned = 1,
-   .disabled   = 1,
-};
-
 void inspect_for_hardlockups(struct pt_regs *regs)
 {
if (__this_cpu_read(watchdog_nmi_touch) == true) {
@@ -155,153 +143,3 @@ void inspect_for_hardlockups(struct pt_regs *regs)
__this_cpu_write(hard_watchdog_warn, false);
return;
 }
-
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-  struct perf_sample_data *data,
-  struct pt_regs *regs)
-{
-   /* Ensure the watchdog never gets throttled */
-   event->hw.interrupts = 0;
-   inspect_for_hardlockups(regs);
-}
-
-static int hardlockup_detector_event_create(void)
-{
-   unsigned int cpu = smp_processor_id();
-   struct perf_event_attr *wd_attr;
-   struct perf_event *evt;
-
-   wd_attr = _hw_attr;
-   wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
-
-   /* Try to register using hardware perf events */
-   evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
-  watchdog_overflow_callback, 
NULL);
-   if (IS_ERR(evt)) {
-   pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
-   PTR_ERR(evt));
-   return PTR_ERR(evt);
-   }
-   this_cpu_write(watchdog_ev, evt);
-   return 0;
-}
-
-/**
- * hardlockup_detector_perf_enable - Enable the local event
- */
-static void hardlockup_detector_perf_enable(void)
-{
-   if (hardlockup_detector_event_create())
-   return;
-
-   /* use original value for check */
-   if (!atomic_fetch_inc(_cpus))
-   pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
-
-   perf_event_enable(this_cpu_read(watchdog_ev));
-}
-
-/**
- * hardlockup_detector_perf_disable - Disable the local event
- */
-static void hardlockup_detector_perf_disable(void)
-{
-   

[RFC PATCH 18/23] watchdog/hardlockup/hpet: Add the NMI watchdog operations

2018-06-12 Thread Ricardo Neri
Implement the start, stop and disable operations of the HPET-based NMI
watchdog. Given that a single timer is used to monitor all the CPUs in
the system, it is necessary to define a cpumask that keeps track of the
CPUs that can be monitored. This cpumask is protected with a spin lock.

As individual CPUs are put online and offline, this cpumask is updated.
CPUs are unconditionally cleared from the mask when going offline. When
going online, the CPU is set in the mask only if is one of the CPUs allowed
to be monitored by the watchdog.

It is not necessary to implement a start function. The NMI watchdog will
be enabled when there is at least one CPU to monitor.

The disable function clears the CPU mask and disables the timer.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  2 +
 include/linux/nmi.h |  1 +
 kernel/watchdog_hld_hpet.c  | 98 +
 3 files changed, 101 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 33309b7..6ace2d1 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -124,6 +124,8 @@ struct hpet_hld_data {
u32 irq;
u32 flags;
u64 ticks_per_second;
+   struct cpumask  monitored_mask;
+   spinlock_t  lock; /* serialized access to monitored_mask */
 };
 
 extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e608762..23e20d2 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -129,6 +129,7 @@ struct nmi_watchdog_ops {
 };
 
 extern struct nmi_watchdog_ops hardlockup_detector_perf_ops;
+extern struct nmi_watchdog_ops hardlockup_detector_hpet_ops;
 
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 3bedffa..857e051 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -345,6 +345,91 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 }
 
 /**
+ * hardlockup_detector_hpet_enable() - Enable the hardlockup detector
+ *
+ * The hardlockup detector is enabled for the CPU that executes the
+ * function. It is only enabled if such CPU is allowed to be monitored
+ * by the lockup detector.
+ *
+ * Returns:
+ *
+ * None
+ *
+ */
+static void hardlockup_detector_hpet_enable(void)
+{
+   struct cpumask *allowed = watchdog_get_allowed_cpumask();
+   unsigned int cpu = smp_processor_id();
+
+   if (!hld_data)
+   return;
+
+   if (!cpumask_test_cpu(cpu, allowed))
+   return;
+
+   spin_lock(_data->lock);
+
+   cpumask_set_cpu(cpu, _data->monitored_mask);
+
+   /*
+* If this is the first CPU to be monitored, set everything in motion:
+* move the interrupt to this CPU, kick and enable the timer.
+*/
+   if (cpumask_weight(_data->monitored_mask) == 1) {
+   if (irq_set_affinity(hld_data->irq, cpumask_of(cpu))) {
+   spin_unlock(_data->lock);
+   pr_err("Unable to enable on CPU %d.!\n", cpu);
+   return;
+   }
+
+   kick_timer(hld_data);
+   enable(hld_data);
+   }
+
+   spin_unlock(_data->lock);
+}
+
+/**
+ * hardlockup_detector_hpet_disable() - Disable the hardlockup detector
+ *
+ * The hardlockup detector is disabled for the CPU that executes the
+ * function.
+ *
+ * None
+ */
+static void hardlockup_detector_hpet_disable(void)
+{
+   if (!hld_data)
+   return;
+
+   spin_lock(_data->lock);
+
+   cpumask_clear_cpu(smp_processor_id(), _data->monitored_mask);
+
+   /* Only disable the timer if there are no more CPUs to monitor. */
+   if (!cpumask_weight(_data->monitored_mask))
+   disable(hld_data);
+
+   spin_unlock(_data->lock);
+}
+
+/**
+ * hardlockup_detector_hpet_stop() - Stop the NMI watchdog on all CPUs
+ *
+ * Returns:
+ *
+ * None
+ */
+static void hardlockup_detector_hpet_stop(void)
+{
+   disable(hld_data);
+
+   spin_lock(_data->lock);
+   cpumask_clear(_data->monitore

[RFC PATCH 19/23] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation

2018-06-12 Thread Ricardo Neri
CPU architectures that have an NMI watchdog use arch_touch_nmi_watchdog()
to briefly ignore the hardlockup detector. If the architecture does not
have an NMI watchdog, one can be constructed using a source of non-
maskable interrupts. In this case, arch_touch_nmi_watchdog() is common
to any underlying hardware resource used to drive the detector and needs
to be available to other kernel subsystems if hardware different from perf
drives the detector.

There exists perf-based and HPET-based implementations. Make it available
to the latter.

For clarity, wrap this function in a separate preprocessor conditional
from functions which are truly specific to the perf-based implementation.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 23e20d2..8b6b814 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -89,16 +89,22 @@ static inline void hardlockup_detector_disable(void) {}
 # define NMI_WATCHDOG_SYSCTL_PERM  0444
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) || \
+defined(CONFIG_HARDLOCKUP_DETECTOR_HPET)
 extern void arch_touch_nmi_watchdog(void);
+#else
+# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
+static inline void arch_touch_nmi_watchdog(void) {}
+# endif
+#endif
+
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline void arch_touch_nmi_watchdog(void) {}
-# endif
+
 #endif
 
 /**
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI

2018-06-12 Thread Ricardo Neri
Certain interrupt controllers (such as APIC) are capable of delivering
interrupts as non-maskable. Likewise, drivers or subsystems (e.g., the
hardlockup detector) might be interested in requesting a non-maskable
interrupt. The new flag IRQF_DELIVER_AS_NMI serves this purpose.

When setting up an interrupt, non-maskable delivery will be set in the
interrupt state data only if supported by the underlying interrupt
controller chips.

Interrupt controller chips can declare that they support non-maskable
delivery by using the new flag IRQCHIP_CAN_DELIVER_AS_NMI.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Daniel Lezcano 
Cc: Andrew Morton 
Cc: "Levin, Alexander (Sasha Levin)" 
Cc: Randy Dunlap 
Cc: Masami Hiramatsu 
Cc: Marc Zyngier 
Cc: Bartosz Golaszewski 
Cc: Doug Berger 
Cc: Palmer Dabbelt 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 include/linux/interrupt.h |  3 +++
 include/linux/irq.h   |  3 +++
 kernel/irq/manage.c   | 22 +-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 5426627..dbc5e02 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -61,6 +61,8 @@
  *interrupt handler after suspending interrupts. For system
  *wakeup devices users need to implement wakeup detection in
  *their interrupt handlers.
+ * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, 
if
+ *supported by the chip.
  */
 #define IRQF_SHARED0x0080
 #define IRQF_PROBE_SHARED  0x0100
@@ -74,6 +76,7 @@
 #define IRQF_NO_THREAD 0x0001
 #define IRQF_EARLY_RESUME  0x0002
 #define IRQF_COND_SUSPEND  0x0004
+#define IRQF_DELIVER_AS_NMI0x0008
 
 #define IRQF_TIMER (__IRQF_TIMER | IRQF_NO_SUSPEND | 
IRQF_NO_THREAD)
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 7271a2c..d2520ae 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -515,6 +515,8 @@ struct irq_chip {
  * IRQCHIP_SKIP_SET_WAKE:  Skip chip.irq_set_wake(), for this irq chip
  * IRQCHIP_ONESHOT_SAFE:   One shot does not require mask/unmask
  * IRQCHIP_EOI_THREADED:   Chip requires eoi() on unmask in threaded mode
+ * IRQCHIP_CAN_DELIVER_AS_NMI  Chip can deliver interrupts it receives as non-
+ * maskable.
  */
 enum {
IRQCHIP_SET_TYPE_MASKED = (1 <<  0),
@@ -524,6 +526,7 @@ enum {
IRQCHIP_SKIP_SET_WAKE   = (1 <<  4),
IRQCHIP_ONESHOT_SAFE= (1 <<  5),
IRQCHIP_EOI_THREADED= (1 <<  6),
+   IRQCHIP_CAN_DELIVER_AS_NMI  = (1 <<  7),
 };
 
 #include 
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e3336d9..d058aa8 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1137,7 +1137,7 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
 {
struct irqaction *old, **old_ptr;
unsigned long flags, thread_mask = 0;
-   int ret, nested, shared = 0;
+   int ret, nested, shared = 0, deliver_as_nmi = 0;
 
if (!desc)
return -EINVAL;
@@ -1156,6 +1156,16 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
if (!(new->flags & IRQF_TRIGGER_MASK))
new->flags |= irqd_get_trigger_type(>irq_data);
 
+   /* Only deliver as non-maskable interrupt if supported by chip. */
+   if (new->flags & IRQF_DELIVER_AS_NMI) {
+   if (desc->irq_data.chip->flags & IRQCHIP_CAN_DELIVER_AS_NMI) {
+   irqd_set_deliver_as_nmi(>irq_data);
+   deliver_as_nmi = 1;
+   } else {
+   return -EINVAL;
+   }
+   }
+
/*
 * Check whether the interrupt nests into another interrupt
 * thread.
@@ -1166,6 +1176,13 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
ret = -EINVAL;
goto out_mput;
}
+
+   /* Don't allow nesting if interrupt will be delivered as NMI. */
+   if (deliver_as_nmi) {
+   ret = -EINVAL;
+   goto out_mput;
+   }
+
/*
 * Replace the primary handler which was provided from
 * the driver for non nested interrupt handling by the
@@ -1186,6 +1203,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
 * thread.
 */
if (new->thread_fn && !nested) {
+   if (deliver_as_nmi)
+   goto out_mput;
+
ret = setup_irq_thre

[RFC PATCH 07/23] x86/hpet: Expose more functions to read and write registers

2018-06-12 Thread Ricardo Neri
Some of the registers in the HPET hardware have a width of 64 bits. 64-bit
access functions are needed mostly to read the counter and write the
comparator in a single read or write. Also, 64-bit accesses can be used to
to read parameters located in the higher bits of some registers (such as
the timer period and the IO APIC pins that can be asserted by the timer)
without the need of masking and shifting the register values.

64-bit read and write functions are added. These functions, along with the
existing hpet_writel(), are exposed via the HPET header to be used by other
kernel subsystems.

Thus far, the only consumer of these functions will the HPET-based
hardlockup detector, which will only be available in 64-bit builds. Thus,
the 64-bit access functions are wrapped in CONFIG_X86_64.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h | 10 ++
 arch/x86/kernel/hpet.c  | 12 +++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 67385d5..9e0afde 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -72,6 +72,11 @@ extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
+extern void hpet_writel(unsigned int d, unsigned int a);
+#ifdef CONFIG_X86_64
+extern unsigned long hpet_readq(unsigned int a);
+extern void hpet_writeq(unsigned long d, unsigned int a);
+#endif
 extern void force_hpet_resume(void);
 
 struct irq_data;
@@ -109,6 +114,11 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler 
handler);
 static inline int hpet_enable(void) { return 0; }
 static inline int is_hpet_enabled(void) { return 0; }
 #define hpet_readl(a) 0
+#define hpet_writel(d, a)
+#ifdef CONFIG_X86_64
+#define hpet_readq(a) 0
+#define hpet_writeq(d, a)
+#endif
 #define default_setup_hpet_msi NULL
 
 #endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 8ce4212..3fa1d3f 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -64,12 +64,22 @@ inline unsigned int hpet_readl(unsigned int a)
return readl(hpet_virt_address + a);
 }
 
-static inline void hpet_writel(unsigned int d, unsigned int a)
+inline void hpet_writel(unsigned int d, unsigned int a)
 {
writel(d, hpet_virt_address + a);
 }
 
 #ifdef CONFIG_X86_64
+inline unsigned long hpet_readq(unsigned int a)
+{
+   return readq(hpet_virt_address + a);
+}
+
+inline void hpet_writeq(unsigned long d, unsigned int a)
+{
+   writeq(d, hpet_virt_address + a);
+}
+
 #include 
 #endif
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 06/23] x86/ioapic: Add support for IRQCHIP_CAN_DELIVER_AS_NMI with interrupt remapping

2018-06-12 Thread Ricardo Neri
Even though there is a delivery mode field at the entries of an IO APIC's
redirection table, the documentation of the majority of the IO APICs
explicitly states that interrupt delivery as non-maskable is not supported.
Thus,

However, when using an IO APIC in combination with the Intel VT-d interrupt
remapping functionality, the delivery of the interrupt to the CPU is
handled by the remapping hardware. In such a case, the interrupt can be
delivered as non maskable.

Thus, add the IRQCHIP_CAN_DELIVER_AS_NMI flag only when used in combination
with interrupt remapping.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Juergen Gross 
Cc: Baoquan He 
Cc: "Eric W. Biederman" 
Cc: Dou Liyang 
Cc: Jan Kiszka 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/apic/io_apic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 10a20f8..39de91b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1911,7 +1911,8 @@ static struct irq_chip ioapic_ir_chip __read_mostly = {
.irq_eoi= ioapic_ir_ack_level,
.irq_set_affinity   = ioapic_set_affinity,
.irq_retrigger  = irq_chip_retrigger_hierarchy,
-   .flags  = IRQCHIP_SKIP_SET_WAKE,
+   .flags  = IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static inline void init_IO_APIC_traps(void)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 21/23] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs

2018-06-12 Thread Ricardo Neri
Each CPU should be monitored for hardlockups every watchdog_thresh seconds.
Since all the CPUs in the system are monitored by the same timer and the
timer interrupt is rotated among the monitored CPUs, the timer must expire
every watchdog_thresh/N seconds; where N is the number of monitored CPUs.

A new member is added to struct hpet_wdt_data to determine the per-CPU
ticks per second. This quantity is used to program the comparator of the
timer.

The ticks-per-CPU quantity is updated every time when the number of
monitored CPUs changes: when the watchdog is enabled or disabled for
a specific CPU.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  1 +
 kernel/watchdog_hld_hpet.c  | 41 -
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6ace2d1..e67818d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -124,6 +124,7 @@ struct hpet_hld_data {
u32 irq;
u32 flags;
u64 ticks_per_second;
+   u64 ticks_per_cpu;
struct cpumask  monitored_mask;
spinlock_t  lock; /* serialized access to monitored_mask */
 };
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index c40acfd..ebb820d 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -65,11 +65,21 @@ static void kick_timer(struct hpet_hld_data *hdata)
 * are able to update the comparator before the counter reaches such new
 * value.
 *
+* The timer must monitor each CPU every watch_thresh seconds. Hence the
+* timer expiration must be:
+*
+*watch_thresh/N
+*
+* where N is the number of monitored CPUs.
+*
+* in order to monitor all the online CPUs. ticks_per_cpu gives the
+* number of ticks needed to meet the condition above.
+*
 * Let it wrap around if needed.
 */
count = get_count();
 
-   new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+   new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
 
set_comparator(hdata, new_compare);
 }
@@ -160,6 +170,33 @@ static bool is_hpet_wdt_interrupt(struct hpet_hld_data 
*hdata)
 }
 
 /**
+ * update_ticks_per_cpu() - Update the number of HPET ticks per CPU
+ * @hdata: struct with the timer's the ticks-per-second and CPU mask
+ *
+ * From the overall ticks-per-second of the timer, compute the number of ticks
+ * after which the timer should expire to monitor each CPU every watch_thresh
+ * seconds. The ticks-per-cpu quantity is computed using the number of CPUs 
that
+ * the watchdog currently monitors.
+ *
+ * Returns:
+ *
+ * None
+ *
+ */
+static void update_ticks_per_cpu(struct hpet_hld_data *hdata)
+{
+   unsigned int num_cpus = cpumask_weight(>monitored_mask);
+   unsigned long long temp = hdata->ticks_per_second;
+
+   /* Only update if there are monitored CPUs. */
+   if (!num_cpus)
+   return;
+
+   do_div(temp, num_cpus);
+   hdata->ticks_per_cpu = temp;
+}
+
+/**
  * hardlockup_detector_irq_handler() - Interrupt handler
  * @irq:   Interrupt number
  * @data:  Data associated with the interrupt
@@ -390,6 +427,7 @@ static void hardlockup_detector_hpet_enable(void)
spin_lock(_data->lock);
 
cpumask_set_cpu(cpu, _data->monitored_mask);
+   update_ticks_per_cpu(hld_data);
 
/*
 * If this is the first CPU to be monitored, set everything in motion:
@@ -425,6 +463,7 @@ static void hardlockup_detector_hpet_disable(void)
spin_lock(_data->lock);
 
cpumask_clear_cpu(smp_processor_id(), _data->monitored_mask);
+   update_ticks_per_cpu(hld_data);
 
/* Only disable the timer if there are no more CPUs to monitor. */
if (!cpumask_weight(_data->monitored_mask))
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI

2018-06-12 Thread Ricardo Neri
In order to detect hardlockups, it is necessary to have the ability to
receive interrupts even when disabled: a non-maskable interrupt is
required. Add the flag IRQF_DELIVER_AS_NMI to the arguments of
request_irq() for this purpose.

Note that the timer, when programmed to deliver interrupts via the IO APIC
is programmed as level-triggered. This is to have an indication that the
NMI comes from HPET timer as indicated in the General Status Interrupt
Register. However, NMIs are always edge-triggered, thus a GSI edge-
triggered interrupt is now requested.

An NMI handler is also implemented. The handler looks for hardlockups and
kicks the timer.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/hpet.c |  2 +-
 kernel/watchdog_hld_hpet.c | 55 +-
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index fda6e19..5ca1953 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -205,7 +205,7 @@ int hpet_hardlockup_detector_assign_legacy_irq(struct 
hpet_hld_data *hdata)
break;
}
 
-   gsi = acpi_register_gsi(NULL, hwirq, ACPI_LEVEL_SENSITIVE,
+   gsi = acpi_register_gsi(NULL, hwirq, ACPI_EDGE_SENSITIVE,
ACPI_ACTIVE_LOW);
if (gsi > 0)
break;
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 8fa4e55..3bedffa 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #undef pr_fmt
 #define pr_fmt(fmt) "NMI hpet watchdog: " fmt
@@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, 
void *data)
if (!(hdata->flags & HPET_DEV_PERI_CAP))
kick_timer(hdata);
 
+   pr_err("This interrupt should not have happened. Ensure delivery mode 
is NMI.\n");
+
/* Acknowledge interrupt if in level-triggered mode */
if (!use_fsb)
hpet_writel(BIT(hdata->num), HPET_STATUS);
@@ -191,6 +194,47 @@ static irqreturn_t hardlockup_detector_irq_handler(int 
irq, void *data)
 }
 
 /**
+ * hardlockup_detector_nmi_handler() - NMI Interrupt handler
+ * @val:   Attribute associated with the NMI. Not used.
+ * @regs:  Register values as seen when the NMI was asserted
+ *
+ * When an NMI is issued, look for hardlockups. If the timer is not periodic,
+ * kick it. The interrupt is always handled when if delivered via the
+ * Front-Side Bus.
+ *
+ * Returns:
+ *
+ * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
+ * otherwise.
+ */
+static int hardlockup_detector_nmi_handler(unsigned int val,
+  struct pt_regs *regs)
+{
+   struct hpet_hld_data *hdata = hld_data;
+   unsigned int use_fsb;
+
+   /*
+* If FSB delivery mode is used, the timer interrupt is programmed as
+* edge-triggered and there is no need to check the ISR register.
+*/
+   use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
+
+   if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
+   return NMI_DONE;
+
+   inspect_for_hardlockups(regs);
+
+   if (!(hdata->flags & HPET_DEV_PERI_CAP))
+   kick_timer(hdata);
+
+   /* Acknowledge interrupt if in level-triggered mode */
+   if (!use_fsb)
+   hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+   return NMI_HANDLED;
+}
+
+/**
  * setup_irq_msi_mode() - Configure the timer to deliver an MSI interrupt
  * @data:  Data associated with the instance of the HPET timer to configure
  *
@@ -282,11 +326,20 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
if (ret)
return ret;
 
+   /* Register the NMI handler, which will be the actual handler we use. */
+   ret = register_nmi_handler(NMI_LOCAL, hardlockup_detector_nmi_handler,
+  0, "hpet_hld");
+   if (ret)
+   return ret;
+
/*
 * Request an interrupt to activate the irq in all the needed domains.
 */
 

[RFC PATCH 01/23] x86/apic: Add a parameter for the APIC delivery mode

2018-06-12 Thread Ricardo Neri
Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. Also, update the
configuration of the delivery mode in the IO APIC, the MSI APIC and the
Intel interrupt remapping driver to use this new per-interrupt member to
configure their respective interrupt tables.

In order to keep the current behavior, initialize the delivery mode of
each interrupt with the with the delivery mode of the APIC driver in use
when the interrupt data is allocated.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Dou Liyang 
Cc: Jan Kiszka 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hw_irq.h   |  5 +++--
 arch/x86/include/asm/msidef.h   |  3 +++
 arch/x86/kernel/apic/io_apic.c  |  2 +-
 arch/x86/kernel/apic/msi.c  |  2 +-
 arch/x86/kernel/apic/vector.c   |  8 
 arch/x86/platform/uv/uv_irq.c   |  2 +-
 drivers/iommu/intel_irq_remapping.c | 10 +-
 7 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 32e666e..c024e59 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -117,8 +117,9 @@ struct irq_alloc_info {
 };
 
 struct irq_cfg {
-   unsigned intdest_apicid;
-   unsigned intvector;
+   unsigned intdest_apicid;
+   unsigned intvector;
+   enum ioapic_irq_destination_types   delivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8cc..6aef434 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -16,6 +16,9 @@
 MSI_DATA_VECTOR_MASK)
 
 #define MSI_DATA_DELIVERY_MODE_SHIFT   8
+#define MSI_DATA_DELIVERY_MODE_MASK0x0700
+#define MSI_DATA_DELIVERY_MODE(dm) (((dm) << MSI_DATA_DELIVERY_MODE_SHIFT) 
& \
+MSI_DATA_DELIVERY_MODE_MASK)
 #define  MSI_DATA_DELIVERY_FIXED   (0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI  (1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7553819..10a20f8 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2887,8 +2887,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct 
mp_chip_data *data,
   struct IO_APIC_route_entry *entry)
 {
memset(entry, 0, sizeof(*entry));
-   entry->delivery_mode = apic->irq_delivery_mode;
entry->dest_mode = apic->irq_dest_mode;
+   entry->delivery_mode = cfg->delivery_mode;
entry->dest  = cfg->dest_apicid;
entry->vector= cfg->vector;
entry->trigger   = data->trigger;
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index ce503c9..12202ac 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -45,7 +45,7 @@ static void irq_msi_compose_msg(struct irq_data *data, struct 
msi_msg *msg)
msg->data =
MSI_DATA_TRIGGER_EDGE |
MSI_DATA_LEVEL_ASSERT |
-   MSI_DATA_DELIVERY_FIXED |
+   MSI_DATA_DELIVERY_MODE(cfg->delivery_mode) |
MSI_DATA_VECTOR(cfg->vector);
 }
 
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb6f7a2..dfe0a2a 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -547,6 +547,14 @@ static int x86_vector_alloc_irqs(struct irq_domain 
*domain, unsigned int virq,
irqd->chip_data = apicd;
irqd->hwirq = virq + i;
irqd_set_single_target(irqd);
+
+   /*
+* Initialize the delivery mode of this irq to match
+* the default delivery mode of the APIC. This could be
+* changed later when the interrupt is activated.
+*/
+apicd->hw_irq_cfg.delivery_mode = apic->irq_delivery_mode;
+
/*
 * Legacy vectors are already assigned when the IOAPIC
 * takes

[RFC PATCH 02/23] genirq: Introduce IRQD_DELIVER_AS_NMI

2018-06-12 Thread Ricardo Neri
Certain interrupt controllers (e.g., APIC) are capable of delivering
interrupts to the CPU as non-maskable. Add the new IRQD_DELIVER_AS_NMI
interrupt state flag. The purpose of this flag is to communicate to the
underlying irqchip whether the interrupt must be delivered in this manner.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Marc Zyngier 
Cc: Bartosz Golaszewski 
Cc: Doug Berger 
Cc: Palmer Dabbelt 
Cc: Randy Dunlap 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri 
---
 include/linux/irq.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 65916a3..7271a2c 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -208,6 +208,7 @@ struct irq_data {
  * IRQD_SINGLE_TARGET  - IRQ allows only a single affinity target
  * IRQD_DEFAULT_TRIGGER_SET- Expected trigger already been set
  * IRQD_CAN_RESERVE- Can use reservation mode
+ * IRQD_DELIVER_AS_NMI - Deliver this interrupt as non-maskable
  */
 enum {
IRQD_TRIGGER_MASK   = 0xf,
@@ -230,6 +231,7 @@ enum {
IRQD_SINGLE_TARGET  = (1 << 24),
IRQD_DEFAULT_TRIGGER_SET= (1 << 25),
IRQD_CAN_RESERVE= (1 << 26),
+   IRQD_DELIVER_AS_NMI = (1 << 27),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -389,6 +391,16 @@ static inline bool irqd_can_reserve(struct irq_data *d)
return __irqd_to_state(d) & IRQD_CAN_RESERVE;
 }
 
+static inline void irqd_set_deliver_as_nmi(struct irq_data *d)
+{
+   __irqd_to_state(d) |= IRQD_DELIVER_AS_NMI;
+}
+
+static inline bool irqd_deliver_as_nmi(struct irq_data *d)
+{
+   return __irqd_to_state(d) & IRQD_DELIVER_AS_NMI;
+}
+
 #undef __irqd_to_state
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 15/23] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask

2018-06-12 Thread Ricardo Neri
Implementations of NMI watchdogs that use a single piece of hardware to
monitor all the CPUs in the system (as opposed to per-CPU implementations
such as perf) need to know which CPUs the watchdog is allowed to monitor.
In this manner, non-maskable interrupts are directed only to the monitored
CPUs.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: iommu@lists.linux-foundation.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h | 1 +
 kernel/watchdog.c   | 7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e61b441..e608762 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -77,6 +77,7 @@ static inline void reset_hung_task_detector(void) { }
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
+extern struct cpumask *watchdog_get_allowed_cpumask(void);
 extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5057376..b94bbe3 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -50,7 +50,7 @@ int __read_mostly nmi_watchdog_available;
 
 static struct nmi_watchdog_ops *nmi_wd_ops;
 
-struct cpumask watchdog_allowed_mask __read_mostly;
+static struct cpumask watchdog_allowed_mask __read_mostly;
 
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(_cpumask);
@@ -98,6 +98,11 @@ static int __init hardlockup_all_cpu_backtrace_setup(char 
*str)
 }
 __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 # endif /* CONFIG_SMP */
+
+struct cpumask *watchdog_get_allowed_cpumask(void)
+{
+   return _allowed_mask;
+}
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /*
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 05/23] x86/msi: Add support for IRQCHIP_CAN_DELIVER_AS_NMI

2018-06-12 Thread Ricardo Neri
As per the Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3 Section 10.11.2, the delivery mode field of the interrupt message
can be set to configure as non-maskable. Declare support to deliver non-
maskable interrupts by adding IRQCHIP_CAN_DELIVER_AS_NMI.

When composing the interrupt message, the delivery mode is obtained from
the configuration of the interrupt data.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Dou Liyang 
Cc: Juergen Gross 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/apic/msi.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 12202ac..68b6a04 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -29,6 +29,9 @@ static void irq_msi_compose_msg(struct irq_data *data, struct 
msi_msg *msg)
 {
struct irq_cfg *cfg = irqd_cfg(data);
 
+   if (irqd_deliver_as_nmi(data))
+   cfg->delivery_mode = dest_NMI;
+
msg->address_hi = MSI_ADDR_BASE_HI;
 
if (x2apic_enabled())
@@ -297,7 +300,7 @@ static struct irq_chip hpet_msi_controller __ro_after_init 
= {
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_compose_msi_msg = irq_msi_compose_msg,
.irq_write_msi_msg = hpet_msi_write_msg,
-   .flags = IRQCHIP_SKIP_SET_WAKE,
+   .flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 04/23] iommu/vt-d/irq_remapping: Add support for IRQCHIP_CAN_DELIVER_AS_NMI

2018-06-12 Thread Ricardo Neri
The Intel IOMMU is capable of delivering remapped interrupts as non-
maskable. Add the IRQCHIP_CAN_DELIVER_AS_NMI flag to its irq_chip
structure to declare this capability. The delivery mode of each interrupt
can be set separately.

By default, the deliver mode is taken from the configuration field of the
interrupt data. If non-maskable delivery is requested in the interrupt
state flags, the respective entry in the remapping table is updated.

When remapping an interrupt from an IO APIC, modify the delivery
field in the interrupt remapping table entry. When remapping an MSI
interrupt, simply update the delivery mode when composing the message.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 drivers/iommu/intel_irq_remapping.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 9f3a04d..b6cf7c4 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1128,10 +1128,14 @@ static void intel_ir_reconfigure_irte(struct irq_data 
*irqd, bool force)
struct irte *irte = _data->irte_entry;
struct irq_cfg *cfg = irqd_cfg(irqd);
 
+   if (irqd_deliver_as_nmi(irqd))
+   cfg->delivery_mode = dest_NMI;
+
/*
 * Atomically updates the IRTE with the new destination, vector
 * and flushes the interrupt entry cache.
 */
+   irte->dlvry_mode = cfg->delivery_mode;
irte->vector = cfg->vector;
irte->dest_id = IRTE_DEST(cfg->dest_apicid);
 
@@ -1182,6 +1186,9 @@ static void intel_ir_compose_msi_msg(struct irq_data 
*irq_data,
 {
struct intel_ir_data *ir_data = irq_data->chip_data;
 
+   if (irqd_deliver_as_nmi(irq_data))
+   ir_data->irte_entry.dlvry_mode = dest_NMI;
+
*msg = ir_data->msi_entry;
 }
 
@@ -1227,6 +1234,7 @@ static struct irq_chip intel_ir_chip = {
.irq_set_affinity   = intel_ir_set_affinity,
.irq_compose_msi_msg= intel_ir_compose_msi_msg,
.irq_set_vcpu_affinity  = intel_ir_set_vcpu_affinity,
+   .flags  = IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations

2018-06-12 Thread Ricardo Neri
Instead of exposing individual functions for the operations of the NMI
watchdog, define a common interface that can be used across multiple
implementations.

The struct nmi_watchdog_ops is defined for such operations. These initial
definitions include the enable, disable, start, stop, and cleanup
operations.

Only a single NMI watchdog can be used in the system. The operations of
this NMI watchdog are accessed via the new variable nmi_wd_ops. This
variable is set to point the operations of the first NMI watchdog that
initializes successfully. Even though at this moment, the only available
NMI watchdog is the perf-based hardlockup detector. More implementations
can be added in the future.

While introducing this new struct for the NMI watchdog operations, convert
the perf-based NMI watchdog to use these operations.

The functions hardlockup_detector_perf_restart() and
hardlockup_detector_perf_stop() are special. They are not regular watchdog
operations; they are used to work around hardware bugs. Thus, they are not
used for the start and stop operations. Furthermore, the perf-based NMI
watchdog does not need to implement such operations. They are intended to
globally start and stop the NMI watchdog; the perf-based NMI
watchdog is implemented on a per-CPU basis.

Currently, when perf-based hardlockup detector is not selected at build
time, a dummy hardlockup_detector_perf_init() is used. The return value
of this function depends on CONFIG_HAVE_NMI_WATCHDOG. This behavior is
conserved by defining using the set of NMI watchdog operations structure
hardlockup_detector_noop. These dummy operations are used when no hard-
lockup detector is used or fails to initialize.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: "Luis R. Rodriguez" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h   | 39 +++--
 kernel/watchdog.c | 54 +--
 kernel/watchdog_hld.c | 16 +++
 3 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index b8d868d..d3f5d55f 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -92,24 +92,43 @@ static inline void hardlockup_detector_disable(void) {}
 extern void arch_touch_nmi_watchdog(void);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
-extern void hardlockup_detector_perf_disable(void);
-extern void hardlockup_detector_perf_enable(void);
-extern void hardlockup_detector_perf_cleanup(void);
-extern int hardlockup_detector_perf_init(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-static inline void hardlockup_detector_perf_disable(void) { }
-static inline void hardlockup_detector_perf_enable(void) { }
-static inline void hardlockup_detector_perf_cleanup(void) { }
 # if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
 static inline void arch_touch_nmi_watchdog(void) {}
-# else
-static inline int hardlockup_detector_perf_init(void) { return 0; }
 # endif
 #endif
 
+/**
+ * struct nmi_watchdog_ops - Operations performed by NMI watchdogs
+ * @init:  Initialize and configure the hardware resources of the
+ * NMI watchdog.
+ * @enable:Enable (i.e., monitor for hardlockups) the NMI watchdog
+ * in the CPU in which the function is executed.
+ * @disable:   Disable (i.e., do not monitor for hardlockups) the NMI
+ * in the CPU in which the function is executed.
+ * @start: Start the the NMI watchdog in all CPUs. Used after the
+ * parameters of the watchdog are updated. Optional if
+ * such updates does not impact operation the NMI watchdog.
+ * @stop:  Stop the the NMI watchdog in all CPUs. Used before the
+ * parameters of the watchdog are updated. Optional if
+ * such updates does not impact the NMI watchdog.
+ * @cleanup:   Cleanup unneeded data structures of the NMI watchdog.
+ * Used after updating the parameters of the watchdog.
+ * Optional no cleanup is needed.
+ */
+struct nmi_watchdog_ops {
+   int (*init)(void);
+   void(*enable)(void);
+   void(*disable)(void);

[RFC PATCH 11/23] x86/hpet: Configure the timer used by the hardlockup detector

2018-06-12 Thread Ricardo Neri
Implement the initial configuration of the timer to be used by the
hardlockup detector. The main focus of this configuration is to provide an
interrupt for the timer.

Two types of interrupt can be assigned to the timer. First, attempt to
assign a message-signaled interrupt. This implies creating the HPET MSI
domain; only if it was not created when HPET timers are used for event
timers. The data structures needed to allocate the MSI interrupt in the
domain are also created.

If message-signaled interrupts cannot be used, assign a legacy IO APIC
interrupt via the ACPI Global System Interrupts.

The resulting interrupt configuration, along with the timer instance, and
frequency are then made available to the hardlockup detector in a struct
via the new function hpet_hardlockup_detector_assign_timer().

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  16 +++
 arch/x86/kernel/hpet.c  | 112 +++-
 2 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 9fd112a..33309b7 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -118,6 +118,22 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler 
handler);
 
 #endif /* CONFIG_HPET_EMULATE_RTC */
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
+struct hpet_hld_data {
+   u32 num;
+   u32 irq;
+   u32 flags;
+   u64 ticks_per_second;
+};
+
+extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
+#else
+static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{
+   return NULL;
+}
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_HPET */
+
 #else /* CONFIG_HPET_TIMER */
 
 static inline int hpet_enable(void) { return 0; }
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 99d4972..fda6e19 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +37,7 @@ bool  hpet_msi_disable;
 
 #ifdef CONFIG_PCI_MSI
 static unsigned inthpet_num_timers;
+static struct irq_domain   *hpet_domain;
 #endif
 static void __iomem*hpet_virt_address;
 
@@ -177,6 +179,115 @@ do {  
\
_hpet_print_config(__func__, __LINE__); \
 } while (0)
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
+static
+int hpet_hardlockup_detector_assign_legacy_irq(struct hpet_hld_data *hdata)
+{
+   unsigned long v;
+   int gsi, hwirq;
+
+   /* Obtain interrupt pins that can be used by this timer. */
+   v = hpet_readq(HPET_Tn_CFG(HPET_WD_TIMER_NR));
+   v = (v & Tn_INT_ROUTE_CAP_MASK) >> Tn_INT_ROUTE_CAP_SHIFT;
+
+   /*
+* In PIC mode, skip IRQ0-4, IRQ6-9, IRQ12-15 which is always used by
+* legacy device. In IO APIC mode, we skip all the legacy IRQS.
+*/
+   if (acpi_irq_model == ACPI_IRQ_MODEL_PIC)
+   v &= ~0xf3df;
+   else
+   v &= ~0x;
+
+   for_each_set_bit(hwirq, , HPET_MAX_IRQ) {
+   if (hwirq >= NR_IRQS) {
+   hwirq = HPET_MAX_IRQ;
+   break;
+   }
+
+   gsi = acpi_register_gsi(NULL, hwirq, ACPI_LEVEL_SENSITIVE,
+   ACPI_ACTIVE_LOW);
+   if (gsi > 0)
+   break;
+   }
+
+   if (hwirq >= HPET_MAX_IRQ)
+   return -ENODEV;
+
+   hdata->irq = hwirq;
+   return 0;
+}
+
+static int hpet_hardlockup_detector_assign_msi_irq(struct hpet_hld_data *hdata)
+{
+   struct hpet_dev *hdev;
+   int hwirq;
+
+   if (hpet_msi_disable)
+   return -ENODEV;
+
+   hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
+   if (!hdev)
+   return -ENOMEM;
+
+   hdev->flags |= HPET_DEV_FSB_CAP;
+   hdev->num = hdata->num;
+   sprintf(hdev->name, "hpet_hld");
+
+   /* Domain may exist if CPU does not have Always-Running APIC Timers. */
+   if (!hpet_domain) {
+   hpet_domain = hpet_create_irq_domain(hpet_blockid);
+   if (!hpet_domain)
+   return -EPERM;
+   }
+
+   hwirq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
+   if (hwirq <= 0) {
+   kfree(hdev);
+   return -ENODEV;
+   }
+
+   hdata->irq = hwirq;
+   hdata->flags |= HPET_DEV_

[RFC PATCH 13/23] watchdog/hardlockup: Define a generic function to detect hardlockups

2018-06-12 Thread Ricardo Neri
The procedure to detect hardlockups is independent of the underlying
mechanism that generated the non-maskable interrupt used to drive the
detector. Thus, it can be put in a separate, generic function. In this
manner, it can be invoked by various implementations of the NMI watchdog.

For this purpose, move the bulk of watchdog_overflow_callback() to the
new function inspect_for_hardlockups(). This function can then be called
from the applicable NMI handlers.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: "Luis R. Rodriguez" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h   |  1 +
 kernel/watchdog_hld.c | 18 +++---
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index d3f5d55f..e61b441 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -223,6 +223,7 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 void __user *, size_t *, loff_t *);
+void inspect_for_hardlockups(struct pt_regs *regs);
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #include 
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 036cb0a..28a00c3 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,14 +106,8 @@ static struct perf_event_attr wd_hw_attr = {
.disabled   = 1,
 };
 
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-  struct perf_sample_data *data,
-  struct pt_regs *regs)
+void inspect_for_hardlockups(struct pt_regs *regs)
 {
-   /* Ensure the watchdog never gets throttled */
-   event->hw.interrupts = 0;
-
if (__this_cpu_read(watchdog_nmi_touch) == true) {
__this_cpu_write(watchdog_nmi_touch, false);
return;
@@ -162,6 +156,16 @@ static void watchdog_overflow_callback(struct perf_event 
*event,
return;
 }
 
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+  struct perf_sample_data *data,
+  struct pt_regs *regs)
+{
+   /* Ensure the watchdog never gets throttled */
+   event->hw.interrupts = 0;
+   inspect_for_hardlockups(regs);
+}
+
 static int hardlockup_detector_event_create(void)
 {
unsigned int cpu = smp_processor_id();
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 10/23] x86/hpet: Relocate flag definitions to a header file

2018-06-12 Thread Ricardo Neri
Users of HPET timers (such as the hardlockup detector) need the definitions
of these flags to interpret the configuration of a timer as passed by
platform code.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h | 6 ++
 arch/x86/kernel/hpet.c  | 6 --
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 3266796..9fd112a 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -64,6 +64,12 @@
 /* Timer used for the hardlockup detector */
 #define HPET_WD_TIMER_NR 2
 
+#define HPET_DEV_USED_BIT  2
+#define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
+#define HPET_DEV_VALID 0x8
+#define HPET_DEV_FSB_CAP   0x1000
+#define HPET_DEV_PERI_CAP  0x2000
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b03faee..99d4972 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -24,12 +24,6 @@
NSEC = 10^-9 */
 #define FSEC_PER_NSEC  100L
 
-#define HPET_DEV_USED_BIT  2
-#define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
-#define HPET_DEV_VALID 0x8
-#define HPET_DEV_FSB_CAP   0x1000
-#define HPET_DEV_PERI_CAP  0x2000
-
 #define HPET_MIN_CYCLES128
 #define HPET_MIN_PROG_DELTA(HPET_MIN_CYCLES + (HPET_MIN_CYCLES >> 
1))
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter

2018-06-12 Thread Ricardo Neri
Keep the HPET-based hardlockup detector disabled unless explicitly enabled
via a command line argument. If such parameter is not given, the hardlockup
detector will fallback to use the perf-based implementation.

The function hardlockup_panic_setup() is updated to return 0 in order to
to allow __setup functions of specific hardlockup detectors (in this case
hardlockup_detector_hpet_setup()) to inspect the nmi_watchdog boot
parameter.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
--
checkpatch gives the following warning:

CHECK: __setup appears un-documented -- check 
Documentation/admin-guide/kernel-parameters.rst
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);

This is a false-positive as the option nmi_watchdog is already
documented. The option is re-evaluated in this file as well.
---
 Documentation/admin-guide/kernel-parameters.txt |  5 -
 kernel/watchdog.c   |  2 +-
 kernel/watchdog_hld_hpet.c  | 13 +
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index f2040d4..a8833c7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2577,7 +2577,7 @@
Format: [state][,regs][,debounce][,die]
 
nmi_watchdog=   [KNL,BUGS=X86] Debugging features for SMP kernels
-   Format: [panic,][nopanic,][num]
+   Format: [panic,][nopanic,][num,][hpet]
Valid num: 0 or 1
0 - turn hardlockup detector in nmi_watchdog off
1 - turn hardlockup detector in nmi_watchdog on
@@ -2587,6 +2587,9 @@
please see 'nowatchdog'.
This is useful when you use a panic=... timeout and
need the box quickly up again.
+   When hpet is specified, the NMI watchdog will be driven
+   by an HPET timer, if available in the system. Otherwise,
+   the perf-based implementation will be used.
 
These settings can be accessed at runtime via
the nmi_watchdog and hardlockup_panic sysctls.
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b94bbe3..b5ce6e4 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -84,7 +84,7 @@ static int __init hardlockup_panic_setup(char *str)
nmi_watchdog_user_enabled = 0;
else if (!strncmp(str, "1", 1))
nmi_watchdog_user_enabled = 1;
-   return 1;
+   return 0;
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
 
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index ebb820d..12e5937 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -17,6 +17,7 @@
 #define pr_fmt(fmt) "NMI hpet watchdog: " fmt
 
 static struct hpet_hld_data *hld_data;
+static bool hardlockup_use_hpet;
 
 /**
  * get_count() - Get the current count of the HPET timer
@@ -488,6 +489,15 @@ static void hardlockup_detector_hpet_stop(void)
spin_unlock(_data->lock);
 }
 
+static int __init hardlockup_detector_hpet_setup(char *str)
+{
+   if (strstr(str, "hpet"))
+   hardlockup_use_hpet = true;
+
+   return 0;
+}
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);
+
 /**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
@@ -502,6 +512,9 @@ static int __init hardlockup_detector_hpet_init(void)
 {
int ret;
 
+   if (!hardlockup_use_hpet)
+   return -EINVAL;
+
if (!is_hpet_enabled())
return -ENODEV;
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs

2018-06-12 Thread Ricardo Neri
In order to detect hardlockups in all the monitored CPUs, move the
interrupt to the next monitored CPU when handling the NMI interrupt; wrap
around when reaching the highest CPU in the mask. This rotation is achieved
by setting the affinity mask to only contain the next CPU to monitor.

In order to prevent our interrupt to be reassigned to another CPU, flag
it as IRQF_NONBALANCING.

The cpumask monitored_mask keeps track of the CPUs that the watchdog
should monitor. This structure is updated when the NMI watchdog is
enabled or disabled in a specific CPU. As this mask can change
concurrently as CPUs are put online or offline and the watchdog is
disabled or enabled, a lock is required to protect the monitored_mask.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 kernel/watchdog_hld_hpet.c | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 857e051..c40acfd 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #undef pr_fmt
@@ -199,8 +200,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, 
void *data)
  * @regs:  Register values as seen when the NMI was asserted
  *
  * When an NMI is issued, look for hardlockups. If the timer is not periodic,
- * kick it. The interrupt is always handled when if delivered via the
- * Front-Side Bus.
+ * kick it. Move the interrupt to the next monitored CPU. The interrupt is
+ * always handled when if delivered via the Front-Side Bus.
  *
  * Returns:
  *
@@ -211,7 +212,7 @@ static int hardlockup_detector_nmi_handler(unsigned int val,
   struct pt_regs *regs)
 {
struct hpet_hld_data *hdata = hld_data;
-   unsigned int use_fsb;
+   unsigned int use_fsb, cpu;
 
/*
 * If FSB delivery mode is used, the timer interrupt is programmed as
@@ -222,8 +223,27 @@ static int hardlockup_detector_nmi_handler(unsigned int 
val,
if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
return NMI_DONE;
 
+   /* There are no CPUs to monitor. */
+   if (!cpumask_weight(>monitored_mask))
+   return NMI_HANDLED;
+
inspect_for_hardlockups(regs);
 
+   /*
+* Target a new CPU. Keep trying until we find a monitored CPU. CPUs
+* are addded and removed to this mask at cpu_up() and cpu_down(),
+* respectively. Thus, the interrupt should be able to be moved to
+* the next monitored CPU.
+*/
+   spin_lock(_data->lock);
+   for_each_cpu_wrap(cpu, >monitored_mask, smp_processor_id() + 1) {
+   if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
+   break;
+   pr_err("Could not assign interrupt to CPU %d. Trying with next 
present CPU.\n",
+  cpu);
+   }
+   spin_unlock(_data->lock);
+
if (!(hdata->flags & HPET_DEV_PERI_CAP))
kick_timer(hdata);
 
@@ -336,7 +356,7 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 * Request an interrupt to activate the irq in all the needed domains.
 */
ret = request_irq(hwirq, hardlockup_detector_irq_handler,
- IRQF_TIMER | IRQF_DELIVER_AS_NMI,
+ IRQF_TIMER | IRQF_DELIVER_AS_NMI | IRQF_NOBALANCING,
  "hpet_hld", hdata);
if (ret)
unregister_nmi_handler(NMI_LOCAL, "hpet_hld");
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 23/23] watchdog/hardlockup: Activate the HPET-based lockup detector

2018-06-12 Thread Ricardo Neri
Now that the implementation of the HPET-based hardlockup detector is
complete, enable it. It will be used only if it can be initialized
successfully. Otherwise, the perf-based detector will be used.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 kernel/watchdog.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b5ce6e4..e2cc6c0 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -149,6 +149,21 @@ int __weak __init watchdog_nmi_probe(void)
 {
int ret = -ENODEV;
 
+   /*
+* Try first with the HPET hardlockup detector. It will only
+* succeed if selected at build time and the nmi_watchdog
+* command-line parameter is configured. This ensure that the
+* perf-based detector is used by default, if selected at
+* build time.
+*/
+   if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET))
+   ret = hardlockup_detector_hpet_ops.init();
+
+   if (!ret) {
+   nmi_wd_ops = _detector_hpet_ops;
+   return ret;
+   }
+
if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_PERF))
ret = hardlockup_detector_perf_ops.init();
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI

2018-06-14 Thread Ricardo Neri
On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
> On 13/06/18 10:20, Thomas Gleixner wrote:
> > On Wed, 13 Jun 2018, Julien Thierry wrote:
> >> On 13/06/18 09:34, Peter Zijlstra wrote:
> >>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> >>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> >>>> index 5426627..dbc5e02 100644
> >>>> --- a/include/linux/interrupt.h
> >>>> +++ b/include/linux/interrupt.h
> >>>> @@ -61,6 +61,8 @@
> >>>>*interrupt handler after suspending interrupts. For
> >>>> system
> >>>>*wakeup devices users need to implement wakeup
> >>>> detection in
> >>>>*their interrupt handlers.
> >>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> >>>> non-maskable, if
> >>>> + *supported by the chip.
> >>>>*/
> >>>
> >>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
> >>> NMIs to this level.
> >>>
> >>
> >> I've been working on something similar on arm64 side, and effectively the 
> >> one
> >> thing that might be common to arm64 and intel is the interface to set an
> >> interrupt as NMI. So I guess it would be nice to agree on the right 
> >> approach
> >> for this.
> >>
> >> The way I did it was by introducing a new irq_state and let the irqchip 
> >> driver
> >> handle most of the work (if it supports that state):
> >>
> >> https://lkml.org/lkml/2018/5/25/181
> >>
> >> This has not been ACKed nor NAKed. So I am just asking whether this is a 
> >> more
> >> suitable approach, and if not, is there any suggestions on how to do this?
> > 
> > I really didn't pay attention to that as it's burried in the GIC/ARM series
> > which is usually Marc's playground.
> 
> I'm working my way through it ATM now that I have some brain cycles back.
> 
> > Adding NMI delivery support at low level architecture irq chip level is
> > perfectly fine, but the exposure of that needs to be restricted very
> > much. Adding it to the generic interrupt control interfaces is not going to
> > happen. That's doomed to begin with and a complete abuse of the interface
> > as the handler can not ever be used for that.
> 
> I can only agree with that. Allowing random driver to use request_irq()
> to make anything an NMI ultimately turns it into a complete mess ("hey,
> NMI is *faster*, let's use that"), and a potential source of horrible
> deadlocks.
> 
> What I'd find more palatable is a way for an irqchip to be able to
> prioritize some interrupts based on a set of architecturally-defined
> requirements, and a separate NMI requesting/handling framework that is
> separate from the IRQ API, as the overall requirements are likely to
> completely different.
> 
> It shouldn't have to be nearly as complex as the IRQ API, and require
> much stricter requirements in terms of what you can do there (flow
> handling should definitely be different).

Marc, Julien, do you plan to actively work on this? Would you mind keeping
me in the loop? I also need this work for this watchdog. In the meantime,
I will go through Julien's patches and try to adapt it to my work.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf

2018-06-14 Thread Ricardo Neri
On Thu, Jun 14, 2018 at 11:41:44AM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:19:01 -0700
> Ricardo Neri  wrote:
> 
> > On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> > > On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:  
> > > > The current default implementation of the hardlockup detector assumes 
> > > > that
> > > > it is implemented using perf events.  
> > > 
> > > The sparc and powerpc things are very much not using perf.  
> > 
> > Isn't it true that the current hardlockup detector
> > (under kernel/watchdog_hld.c) is based on perf?
> 
> arch/powerpc/kernel/watchdog.c is a powerpc implementation that uses
> the kernel/watchdog_hld.c framework.
> 
> > As far as I understand,
> > this hardlockup detector is constructed using perf events for architectures
> > that don't provide an NMI watchdog. Perhaps I can be more specific and say
> > that this synthetized detector is based on perf.
> 
> The perf detector is like that, but we want NMI watchdogs to share
> the watchdog_hld code as much as possible even for arch specific NMI
> watchdogs, so that kernel and user interfaces and behaviour are
> consistent.
> 
> Other arch watchdogs like sparc are a little older so they are not
> using HLD. You don't have to change those for your series, but it
> would be good to bring them into the fold if possible at some time.
> IIRC sparc was slightly non-trivial because it has some differences
> in sysctl or cmdline APIs that we don't want to break.
> 
> But powerpc at least needs to be updated if you change hld apis.

I will look into updating at least the powerpc implementation as part
of these changes.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs

2018-06-14 Thread Ricardo Neri
On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > +   /* There are no CPUs to monitor. */
> > +   if (!cpumask_weight(>monitored_mask))
> > +   return NMI_HANDLED;
> > +
> > inspect_for_hardlockups(regs);
> >  
> > +   /*
> > +* Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > +* are addded and removed to this mask at cpu_up() and cpu_down(),
> > +* respectively. Thus, the interrupt should be able to be moved to
> > +* the next monitored CPU.
> > +*/
> > +   spin_lock(_data->lock);
> 
> Yuck. Taking a spinlock from NMI ...

I am sorry. I will look into other options for locking. Do you think rcu_lock
would help in this case? I need this locking because the CPUs being monitored
changes as CPUs come online and offline.

> 
> > +   for_each_cpu_wrap(cpu, >monitored_mask, smp_processor_id() + 1) {
> > +   if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > +   break;
> 
> ... and then calling into generic interrupt code which will take even more
> locks is completely broken.


I will into reworking how the destination of the interrupt is set.

Thanks and BR,

Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations

2018-06-14 Thread Ricardo Neri
On Thu, Jun 14, 2018 at 12:32:50PM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:31:17 -0700
> Ricardo Neri  wrote:
> 
> > On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> > > On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> > > Thomas Gleixner  wrote:
> > >   
> > > > On Wed, 13 Jun 2018, Peter Zijlstra wrote:  
> > > > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:    
> > > > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > > > Ricardo Neri  wrote:
> > > > > > 
> > > > > > > Instead of exposing individual functions for the operations of 
> > > > > > > the NMI
> > > > > > > watchdog, define a common interface that can be used across 
> > > > > > > multiple
> > > > > > > implementations.
> > > > > > > 
> > > > > > > The struct nmi_watchdog_ops is defined for such operations. These 
> > > > > > > initial
> > > > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > > > operations.
> > > > > > > 
> > > > > > > Only a single NMI watchdog can be used in the system. The 
> > > > > > > operations of
> > > > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. 
> > > > > > > This
> > > > > > > variable is set to point the operations of the first NMI watchdog 
> > > > > > > that
> > > > > > > initializes successfully. Even though at this moment, the only 
> > > > > > > available
> > > > > > > NMI watchdog is the perf-based hardlockup detector. More 
> > > > > > > implementations
> > > > > > > can be added in the future.
> > > > > > 
> > > > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > > > least have their own NMI watchdogs, it would be good to have those
> > > > > > converted as well.
> > > > > 
> > > > > Yeah, agreed, this looks like half a patch.
> > > > 
> > > > Though I'm not seeing the advantage of it. That kind of NMI watchdogs 
> > > > are
> > > > low level architecture details so having yet another 'ops' data 
> > > > structure
> > > > with a gazillion of callbacks, checks and indirections does not provide
> > > > value over the currently available weak stubs.  
> > > 
> > > The other way to go of course is librify the perf watchdog and make an
> > > x86 watchdog that selects between perf and hpet... I also probably
> > > prefer that for code such as this, but I wouldn't strongly object to
> > > ops struct if I'm not writing the code. It's not that bad is it?  
> > 
> > My motivation to add the ops was that the hpet and perf watchdog share
> > significant portions of code.
> 
> Right, a good motivation.
> 
> > I could look into creating the library for
> > common code and relocate the hpet watchdog into arch/x86 for the hpet-
> > specific parts.
> 
> If you can investigate that approach, that would be appreciated. I hope
> I did not misunderstand you there, Thomas.
> 
> Basically you would have perf infrastructure and hpet infrastructure,
> and then the x86 watchdog driver will use one or the other of those. The
> generic watchdog driver will be just a simple shim that uses the perf
> infrastructure. Then hopefully the powerpc driver would require almost
> no change.

Sure, I will try to structure code to minimize the changes to the powerpc
watchdog... without breaking the sparc one.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI

2018-06-14 Thread Ricardo Neri
On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int 
> > irq, void *data)
> > if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > kick_timer(hdata);
> >  
> > +   pr_err("This interrupt should not have happened. Ensure delivery mode 
> > is NMI.\n");
> 
> Eeew.

If you don't mind me asking. What is the problem with this error message?
> 
> >  /**
> > + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> > + * @val:   Attribute associated with the NMI. Not used.
> > + * @regs:  Register values as seen when the NMI was asserted
> > + *
> > + * When an NMI is issued, look for hardlockups. If the timer is not 
> > periodic,
> > + * kick it. The interrupt is always handled when if delivered via the
> > + * Front-Side Bus.
> > + *
> > + * Returns:
> > + *
> > + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> > + * otherwise.
> > + */
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +  struct pt_regs *regs)
> > +{
> > +   struct hpet_hld_data *hdata = hld_data;
> > +   unsigned int use_fsb;
> > +
> > +   /*
> > +* If FSB delivery mode is used, the timer interrupt is programmed as
> > +* edge-triggered and there is no need to check the ISR register.
> > +*/
> > +   use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> > +
> > +   if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> > +   return NMI_DONE;
> 
> So for 'use_fsb == True' every single NMI will fall through into the
> watchdog code below.
> 
> > +   inspect_for_hardlockups(regs);
> > +
> > +   if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +   kick_timer(hdata);
> 
> And in case that the HPET does not support periodic mode this reprogramms
> the timer on every NMI which means that while perf is running the watchdog
> will never ever detect anything.

Yes. I see that this is wrong. With MSI interrupts, as far as I can
see, there is not a way to make sure that the HPET timer caused the NMI
perhaps the only option is to use an IO APIC interrupt and read the
interrupt status register.

> 
> Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> access is horribly slow, so any high frequency perf monitoring will take a
> massive performance hit.

If an IO APIC interrupt is used, only HPET register (the status register)
would need to be read for every NMI. Would that be more acceptable? Otherwise,
there is no way to determine if the HPET cause the NMI.

Alternatively, there could be a counter that skips reading the HPET status
register (and the detection of hardlockups) for every X NMIs. This would
reduce the overall frequency of HPET register reads.

Is that more acceptable?

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI

2018-06-14 Thread Ricardo Neri
On Wed, Jun 13, 2018 at 11:07:20AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:37PM -0700, Ricardo Neri wrote:
> 
> +static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
> +{
> +   unsigned long this_isr;
> +   unsigned int lvl_trig;
> +
> +   this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
> +
> +   lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
> +
> +   if (lvl_trig && this_isr)
> +   return true;
> +
> +   return false;
> +}
> 
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +  struct pt_regs *regs)
> > +{
> > +   struct hpet_hld_data *hdata = hld_data;
> > +   unsigned int use_fsb;
> > +
> > +   /*
> > +* If FSB delivery mode is used, the timer interrupt is programmed as
> > +* edge-triggered and there is no need to check the ISR register.
> > +*/
> > +   use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> 
> Please do explain.. That FSB thing basically means MSI. But there's only
> a single NMI vector. How do we know this NMI came from the HPET?

Indeed, I see now that this is wrong. There is no way to know. The only way
is to use an IO APIC interrupt and read the HPET status register.

> 
> > +
> > +   if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> 
> So you add _2_ HPET reads for every single NMI that gets triggered...
> and IIRC HPET reads are _slloww_.


Since the trigger mode of the HPET timer is not expected to change, 
perhaps is_hpet_wdt_interrupt() can only need the interrupt status
register. This would reduce the reads to one. Furthermore, the hardlockup
detector can skip an X number of NMIs and reduce further the frequency
of reads. Does this make sense?

> 
> > +   return NMI_DONE;
> > +
> > +   inspect_for_hardlockups(regs);
> > +
> > +   if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +   kick_timer(hdata);
> > +
> > +   /* Acknowledge interrupt if in level-triggered mode */
> > +   if (!use_fsb)
> > +   hpet_writel(BIT(hdata->num), HPET_STATUS);
> > +
> > +   return NMI_HANDLED;
> 
> So if I read this right, when in FSB/MSI mode, we'll basically _always_
> claim every single NMI as handled?
> 
> That's broken.

Yes, this is not correct. I will drop the functionality to use
FSB/MSI mode.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter

2018-06-13 Thread Ricardo Neri
On Tue, Jun 12, 2018 at 10:26:57PM -0700, Randy Dunlap wrote:
> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index f2040d4..a8833c7 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -2577,7 +2577,7 @@
> > Format: [state][,regs][,debounce][,die]
> >  
> > nmi_watchdog=   [KNL,BUGS=X86] Debugging features for SMP kernels
> > -   Format: [panic,][nopanic,][num]
> > +   Format: [panic,][nopanic,][num,][hpet]
> > Valid num: 0 or 1
> > 0 - turn hardlockup detector in nmi_watchdog off
> > 1 - turn hardlockup detector in nmi_watchdog on
> 
> This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
> Is that correct?

Yes, this what I meant. In my view, if you set nmi_watchdog=hpet it
implies that you want to activate the NMI watchdog. In this case, perf.

I can see how this will be ambiguous for the case of perf and arch NMI
watchdogs.

Alternative, a new parameter could be added; such as nmi_watchdog_type. I
didn't want to add it in this patchset as I think that a single parameter
can handle the enablement and type of the NMI watchdog.

What do you think?

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector

2018-06-13 Thread Ricardo Neri
On Tue, Jun 12, 2018 at 10:23:47PM -0700, Randy Dunlap wrote:
> Hi,

Hi Randy,

> 
> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index c40c7b7..6e79833 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
> > bool
> > select SOFTLOCKUP_DETECTOR
> >  
> > +config HARDLOCKUP_DETECTOR_HPET
> > +   bool "Use HPET Timer for Hard Lockup Detection"
> > +   select SOFTLOCKUP_DETECTOR
> > +   select HARDLOCKUP_DETECTOR
> > +   depends on HPET_TIMER && HPET
> > +   help
> > + Say y to enable a hardlockup detector that is driven by an 
> > High-Precision
> > + Event Timer. In addition to selecting this option, the command-line
> > + parameter nmi_watchdog option. See 
> > Documentation/admin-guide/kernel-parameters.rst
> 
> The "In addition ..." thing is a broken (incomplete) sentence.

Oops. I apologize. I missed this I will fix it in my next version.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf

2018-06-13 Thread Ricardo Neri
On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:
> > The current default implementation of the hardlockup detector assumes that
> > it is implemented using perf events.
> 
> The sparc and powerpc things are very much not using perf.

Isn't it true that the current hardlockup detector
(under kernel/watchdog_hld.c) is based on perf? As far as I understand,
this hardlockup detector is constructed using perf events for architectures
that don't provide an NMI watchdog. Perhaps I can be more specific and say
that this synthetized detector is based on perf.

On a side note, I saw that powerpc might use a perf-based hardlockup
detector if it has perf events [1].

Please let me know if my understanding is not correct.

Thanks and BR,
Ricardo

[1]. https://elixir.bootlin.com/linux/v4.17/source/arch/powerpc/Kconfig#L218

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations

2018-06-13 Thread Ricardo Neri
On Wed, Jun 13, 2018 at 10:42:19AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> > On Tue, 12 Jun 2018 17:57:32 -0700
> > Ricardo Neri  wrote:
> > 
> > > Instead of exposing individual functions for the operations of the NMI
> > > watchdog, define a common interface that can be used across multiple
> > > implementations.
> > > 
> > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > definitions include the enable, disable, start, stop, and cleanup
> > > operations.
> > > 
> > > Only a single NMI watchdog can be used in the system. The operations of
> > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > variable is set to point the operations of the first NMI watchdog that
> > > initializes successfully. Even though at this moment, the only available
> > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > can be added in the future.
> > 
> > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > least have their own NMI watchdogs, it would be good to have those
> > converted as well.
> 
> Yeah, agreed, this looks like half a patch.

I planned to look into the conversion of sparc and powerpc. I just wanted
to see the reception to these patches before jumping and do potentially
useless work. Comments in this thread lean towards keep using the weak
stubs.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations

2018-06-13 Thread Ricardo Neri
On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> Thomas Gleixner  wrote:
> 
> > On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:  
> > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > Ricardo Neri  wrote:
> > > >   
> > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > watchdog, define a common interface that can be used across multiple
> > > > > implementations.
> > > > > 
> > > > > The struct nmi_watchdog_ops is defined for such operations. These 
> > > > > initial
> > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > operations.
> > > > > 
> > > > > Only a single NMI watchdog can be used in the system. The operations 
> > > > > of
> > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > initializes successfully. Even though at this moment, the only 
> > > > > available
> > > > > NMI watchdog is the perf-based hardlockup detector. More 
> > > > > implementations
> > > > > can be added in the future.  
> > > > 
> > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > least have their own NMI watchdogs, it would be good to have those
> > > > converted as well.  
> > > 
> > > Yeah, agreed, this looks like half a patch.  
> > 
> > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > low level architecture details so having yet another 'ops' data structure
> > with a gazillion of callbacks, checks and indirections does not provide
> > value over the currently available weak stubs.
> 
> The other way to go of course is librify the perf watchdog and make an
> x86 watchdog that selects between perf and hpet... I also probably
> prefer that for code such as this, but I wouldn't strongly object to
> ops struct if I'm not writing the code. It's not that bad is it?

My motivation to add the ops was that the hpet and perf watchdog share
significant portions of code. I could look into creating the library for
common code and relocate the hpet watchdog into arch/x86 for the hpet-
specific parts.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI

2018-06-20 Thread Ricardo Neri
On Tue, Jun 19, 2018 at 05:25:09PM -0700, Randy Dunlap wrote:
> On 06/19/2018 05:15 PM, Ricardo Neri wrote:
> > On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> >> On Fri, 15 Jun 2018, Ricardo Neri wrote:
> >>> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> >>>> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> >>>>> Alternatively, there could be a counter that skips reading the HPET 
> >>>>> status
> >>>>> register (and the detection of hardlockups) for every X NMIs. This would
> >>>>> reduce the overall frequency of HPET register reads.
> >>>>
> >>>> Great plan. So if the watchdog is the only NMI (because perf is off) then
> >>>> you delay the watchdog detection by that count.
> >>>
> >>> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> >>> register per NMI just to check in the status register if the HPET timer
> >>> caused the NMI?
> >>
> >> The status register is useless in case of MSI. MSI is edge triggered 
> >>
> >> The only register which gives you proper information is the counter
> >> register itself. That adds an massive overhead to each NMI, because the
> >> counter register access is synchronized to the HPET clock with hardware
> >> magic. Plus on larger systems, the HPET access is cross node and even
> >> slower.
> > 
> > It starts to sound that the HPET is too slow to drive the hardlockup 
> > detector.
> > 
> > Would it be possible to envision a variant of this implementation? In this
> > variant, the HPET only targets a single CPU. The actual hardlockup detector
> > is implemented by this single CPU sending interprocessor interrupts to the
> > rest of the CPUs.
> > 
> > In this manner only one CPU has to deal with the slowness of the HPET; the
> > rest of the CPUs don't have to read or write any HPET registers. A sysfs
> > entry could be added to configure which CPU will have to deal with the HPET
> > timer. However, profiling could not be done accurately on such CPU.
> 
> Please forgive my simple question:
> 
> What happens when this one CPU is the one that locks up?

I think that in this particular case this one CPU would check for hardlockups
on itself when it receives the NMI from the HPET timer. It would also issue
NMIs to the other monitored processors.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 04/21] x86/hpet: Add hpet_set_comparator() for periodic and one-shot modes

2019-05-15 Thread Ricardo Neri
On Tue, May 14, 2019 at 07:24:38AM -0700, Randy Dunlap wrote:
> On 5/14/19 7:01 AM, Ricardo Neri wrote:
> > Instead of setting the timer period directly in hpet_set_periodic(), add a
> > new helper function hpet_set_comparator() that only sets the accumulator
> > and comparator. hpet_set_periodic() will only prepare the timer for
> > periodic mode and leave the expiration programming to
> > hpet_set_comparator().
> > 
> > This new function can also be used by other components (e.g., the HPET-
> > based hardlockup detector) which also need to configure HPET timers. Thus,
> > add its declaration into the hpet header file.
> > 
> > Cc: "H. Peter Anvin" 
> > Cc: Ashok Raj 
> > Cc: Andi Kleen 
> > Cc: Tony Luck 
> > Cc: Philippe Ombredanne 
> > Cc: Kate Stewart 
> > Cc: "Rafael J. Wysocki" 
> > Cc: Stephane Eranian 
> > Cc: Suravee Suthikulpanit 
> > Cc: "Ravi V. Shankar" 
> > Cc: x...@kernel.org
> > Originally-by: Suravee Suthikulpanit 
> > Signed-off-by: Ricardo Neri 
> > ---
> >  arch/x86/include/asm/hpet.h |  1 +
> >  arch/x86/kernel/hpet.c  | 57 -
> >  2 files changed, 45 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
> > index f132fbf984d4..e7098740f5ee 100644
> > --- a/arch/x86/include/asm/hpet.h
> > +++ b/arch/x86/include/asm/hpet.h
> > @@ -102,6 +102,7 @@ extern int hpet_rtc_timer_init(void);
> >  extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id);
> >  extern int hpet_register_irq_handler(rtc_irq_handler handler);
> >  extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
> > +extern void hpet_set_comparator(int num, unsigned int cmp, unsigned int 
> > period);
> >  
> >  #endif /* CONFIG_HPET_EMULATE_RTC */
> >  
> > diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> > index 560fc28e1d13..c5c5fc150193 100644
> > --- a/arch/x86/kernel/hpet.c
> > +++ b/arch/x86/kernel/hpet.c
> > @@ -289,6 +289,46 @@ static void hpet_legacy_clockevent_register(void)
> > printk(KERN_DEBUG "hpet clockevent registered\n");
> >  }
> >  
> > +/**
> > + * hpet_set_comparator() - Helper function for setting comparator register
> > + * @num:   The timer ID
> > + * @cmp:   The value to be written to the comparator/accumulator
> > + * @period:The value to be written to the period (0 = oneshot mode)
> > + *
> > + * Helper function for updating comparator, accumulator and period values.
> > + *
> > + * In periodic mode, HPET needs HPET_TN_SETVAL to be set before writing
> > + * to the Tn_CMP to update the accumulator. Then, HPET needs a second
> > + * write (with HPET_TN_SETVAL cleared) to Tn_CMP to set the period.
> > + * The HPET_TN_SETVAL bit is automatically cleared after the first write.
> > + *
> > + * For one-shot mode, HPET_TN_SETVAL does not need to be set.
> > + *
> > + * See the following documents:
> > + *   - Intel IA-PC HPET (High Precision Event Timers) Specification
> > + *   - AMD-8111 HyperTransport I/O Hub Data Sheet, Publication # 24674
> > + */
> > +void hpet_set_comparator(int num, unsigned int cmp, unsigned int period)
> > +{
> > +   if (period) {
> > +   unsigned int v = hpet_readl(HPET_Tn_CFG(num));
> > +
> > +   hpet_writel(v | HPET_TN_SETVAL, HPET_Tn_CFG(num));
> > +   }
> > +
> > +   hpet_writel(cmp, HPET_Tn_CMP(num));
> > +
> > +   if (!period)
> > +   return;
> > +
> > +   /* This delay is seldom used: never in one-shot mode and in periodic
> > +* only when reprogramming the timer.
> > +*/
> 
> comment style warning ;)
>

Uh! I'll correct this. Strangely, I reran checkpatch and it didn't catch
it.

Thanks and BR,
Ricardo


Re: [RFC PATCH v3 03/21] x86/hpet: Calculate ticks-per-second in a separate function

2019-05-15 Thread Ricardo Neri
On Tue, May 14, 2019 at 07:23:47AM -0700, Randy Dunlap wrote:
> On 5/14/19 7:01 AM, Ricardo Neri wrote:
> > It is easier to compute the expiration times of an HPET timer by using
> > its frequency (i.e., the number of times it ticks in a second) than its
> > period, as given in the capabilities register.
> > 
> > In addition to the HPET char driver, the HPET-based hardlockup detector
> > will also need to know the timer's frequency. Thus, create a common
> > function that both can use.
> > 
> > Cc: "H. Peter Anvin" 
> > Cc: Ashok Raj 
> > Cc: Andi Kleen 
> > Cc: Tony Luck 
> > Cc: Clemens Ladisch 
> > Cc: Arnd Bergmann 
> > Cc: Philippe Ombredanne 
> > Cc: Kate Stewart 
> > Cc: "Rafael J. Wysocki" 
> > Cc: Stephane Eranian 
> > Cc: Suravee Suthikulpanit 
> > Cc: "Ravi V. Shankar" 
> > Cc: x...@kernel.org
> > Signed-off-by: Ricardo Neri 
> > ---
> >  drivers/char/hpet.c  | 31 ---
> >  include/linux/hpet.h |  1 +
> >  2 files changed, 25 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
> > index d0ad85900b79..bdcbecfdb858 100644
> > --- a/drivers/char/hpet.c
> > +++ b/drivers/char/hpet.c
> > @@ -836,6 +836,29 @@ static unsigned long hpet_calibrate(struct hpets 
> > *hpetp)
> > return ret;
> >  }
> >  
> > +u64 hpet_get_ticks_per_sec(u64 hpet_caps)
> > +{
> > +   u64 ticks_per_sec, period;
> > +
> > +   period = (hpet_caps & HPET_COUNTER_CLK_PERIOD_MASK) >>
> > +HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
> > +
> > +   /*
> > +* The frequency is the reciprocal of the period. The period is given
> > +* femtoseconds per second. Thus, prepare a dividend to obtain the
> 
>* in femtoseconds per second.
> 

Thanks for your review Randy! I'll fix this grammar issue.
> > +* frequency in ticks per second.
> > +*/
> > +
> > +   /* 10^15 femtoseconds per second */
> > +   ticks_per_sec = 1000uLL;
> 
>   ULL is overwhelmingly used in the kernel.
> 

Sure, I'll update it.

BR,
Ricardo


Re: [RFC PATCH v3 11/21] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector

2019-05-15 Thread Ricardo Neri
On Tue, May 14, 2019 at 07:26:58AM -0700, Randy Dunlap wrote:
> On 5/14/19 7:02 AM, Ricardo Neri wrote:
> > diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
> > index 15d0fbe27872..376a5db81aec 100644
> > --- a/arch/x86/Kconfig.debug
> > +++ b/arch/x86/Kconfig.debug
> > @@ -169,6 +169,17 @@ config IOMMU_LEAK
> >  config HAVE_MMIOTRACE_SUPPORT
> > def_bool y
> >  
> > +config X86_HARDLOCKUP_DETECTOR_HPET
> > +   bool "Use HPET Timer for Hard Lockup Detection"
> > +   select SOFTLOCKUP_DETECTOR
> > +   select HARDLOCKUP_DETECTOR
> > +   select HARDLOCKUP_DETECTOR_CORE
> > +   depends on HPET_TIMER && HPET && X86_64
> > +   help
> > + Say y to enable a hardlockup detector that is driven by an High-
> 
>  by a
> 
I'll correct.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 17/21] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable

2019-06-07 Thread Ricardo Neri
On Thu, Jun 06, 2019 at 05:35:51PM -0700, Stephane Eranian wrote:
> Hi Ricardo,

Hi Stephane,

> Thanks for your contribution here. It is very important to move the
> watchdog out of the PMU wherever possible.

Indeed, using the PMU for the hardlockup detector is still the default
option. This patch series proposes a new kernel command line to switch
to use the HPET.

> 
> On Thu, May 23, 2019 at 6:17 PM Ricardo Neri
>  wrote:
> >
> > The HPET-based hardlockup detector relies on the TSC to determine if an
> > observed NMI interrupt was originated by HPET timer. Hence, this detector
> > can no longer be used with an unstable TSC.
> >
> > In such case, permanently stop the HPET-based hardlockup detector and
> > start the perf-based detector.
> >
> > Signed-off-by: Ricardo Neri 
> > ---
> >  arch/x86/include/asm/hpet.h| 2 ++
> >  arch/x86/kernel/tsc.c  | 2 ++
> >  arch/x86/kernel/watchdog_hld.c | 7 +++
> >  3 files changed, 11 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
> > index fd99f2390714..a82cbe17479d 100644
> > --- a/arch/x86/include/asm/hpet.h
> > +++ b/arch/x86/include/asm/hpet.h
> > @@ -128,6 +128,7 @@ extern int hardlockup_detector_hpet_init(void);
> >  extern void hardlockup_detector_hpet_stop(void);
> >  extern void hardlockup_detector_hpet_enable(unsigned int cpu);
> >  extern void hardlockup_detector_hpet_disable(unsigned int cpu);
> > +extern void hardlockup_detector_switch_to_perf(void);
> >  #else
> >  static inline struct hpet_hld_data 
> > *hpet_hardlockup_detector_assign_timer(void)
> >  { return NULL; }
> > @@ -136,6 +137,7 @@ static inline int hardlockup_detector_hpet_init(void)
> >  static inline void hardlockup_detector_hpet_stop(void) {}
> >  static inline void hardlockup_detector_hpet_enable(unsigned int cpu) {}
> >  static inline void hardlockup_detector_hpet_disable(unsigned int cpu) {}
> > +static void harrdlockup_detector_switch_to_perf(void) {}
> >  #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
> >
> This does not compile for me when CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
> is not enabled.
> because:
>1- you have a typo on the function name
> 2- you are missing the inline keyword

I am sorry. This was an oversight on my side. I have corrected this in
preparation for a v5.

Thanks and BR,
Ricardo


Re: [RFC PATCH v4 18/21] x86/apic: Add a parameter for the APIC delivery mode

2019-06-18 Thread Ricardo Neri
On Sun, Jun 16, 2019 at 11:55:03AM +0200, Thomas Gleixner wrote:
> On Thu, 23 May 2019, Ricardo Neri wrote:
> >  
> >  struct irq_cfg {
> > -   unsigned intdest_apicid;
> > -   unsigned intvector;
> > +   unsigned intdest_apicid;
> > +   unsigned intvector;
> > +   enum ioapic_irq_destination_types   delivery_mode;
> 
> And how is this related to IOAPIC?

In my view, IOAPICs can also be programmed with a delivery mode. Mode
values are the same for MSI interrupts.

> I know this enum exists already, but in
> connection with MSI this does not make any sense at all.

Is the issue here the name of the enumeration?

> 
> > +
> > +   /*
> > +* Initialize the delivery mode of this irq to match the
> > +* default delivery mode of the APIC. This is useful for
> > +* children irq domains which want to take the delivery
> > +* mode from the individual irq configuration rather
> > +* than from the APIC.
> > +*/
> > +apicd->hw_irq_cfg.delivery_mode = apic->irq_delivery_mode;
> 
> And here it's initialized from apic->irq_delivery_mode, which is an
> u32. Intuitive and consistent - NOT!

Yes, this is wrong. Then should the member in the structure above be an
u32 instead of enum ioapic_irq_destination_types?

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 04/21] x86/hpet: Add hpet_set_comparator() for periodic and one-shot modes

2019-06-18 Thread Ricardo Neri
On Fri, Jun 14, 2019 at 08:17:14PM +0200, Thomas Gleixner wrote:
> On Thu, 23 May 2019, Ricardo Neri wrote:
> > +/**
> > + * hpet_set_comparator() - Helper function for setting comparator register
> > + * @num:   The timer ID
> > + * @cmp:   The value to be written to the comparator/accumulator
> > + * @period:The value to be written to the period (0 = oneshot mode)
> > + *
> > + * Helper function for updating comparator, accumulator and period values.
> > + *
> > + * In periodic mode, HPET needs HPET_TN_SETVAL to be set before writing
> > + * to the Tn_CMP to update the accumulator. Then, HPET needs a second
> > + * write (with HPET_TN_SETVAL cleared) to Tn_CMP to set the period.
> > + * The HPET_TN_SETVAL bit is automatically cleared after the first write.
> > + *
> > + * For one-shot mode, HPET_TN_SETVAL does not need to be set.
> > + *
> > + * See the following documents:
> > + *   - Intel IA-PC HPET (High Precision Event Timers) Specification
> > + *   - AMD-8111 HyperTransport I/O Hub Data Sheet, Publication # 24674
> > + */
> > +void hpet_set_comparator(int num, unsigned int cmp, unsigned int period)
> > +{
> > +   if (period) {
> > +   unsigned int v = hpet_readl(HPET_Tn_CFG(num));
> > +
> > +   hpet_writel(v | HPET_TN_SETVAL, HPET_Tn_CFG(num));
> > +   }
> > +
> > +   hpet_writel(cmp, HPET_Tn_CMP(num));
> > +
> > +   if (!period)
> > +   return;
> 
> TBH, I hate this conditional handling. What's wrong with two functions?

There is probably nothing wrong with two functions. I can split it into
hpet_set_comparator_periodic() and hpet_set_comparator(). Perhaps the
latter is not needed as it would be a one-line function; you have
suggested earlier to avoid such small functions.
> 
> > +
> > +   /*
> > +* This delay is seldom used: never in one-shot mode and in periodic
> > +* only when reprogramming the timer.
> > +*/
> > +   udelay(1);
> > +   hpet_writel(period, HPET_Tn_CMP(num));
> > +}
> > +EXPORT_SYMBOL_GPL(hpet_set_comparator);
> 
> Why is this exported? Which module user needs this?

It is not used anywhere else. I will remove this export.

Thanks and BR,

Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 05/21] x86/hpet: Reserve timer for the HPET hardlockup detector

2019-06-18 Thread Ricardo Neri
On Fri, Jun 14, 2019 at 06:10:18PM +0200, Thomas Gleixner wrote:
> On Thu, 13 Jun 2019, Ricardo Neri wrote:
> 
> > On Tue, Jun 11, 2019 at 09:54:25PM +0200, Thomas Gleixner wrote:
> > > On Thu, 23 May 2019, Ricardo Neri wrote:
> > > 
> > > > HPET timer 2 will be used to drive the HPET-based hardlockup detector.
> > > > Reserve such timer to ensure it cannot be used by user space programs or
> > > > for clock events.
> > > > 
> > > > When looking for MSI-capable timers for clock events, skip timer 2 if
> > > > the HPET hardlockup detector is selected.
> > > 
> > > Why? Both the changelog and the code change lack an explanation why this
> > > timer is actually touched after it got reserved for the platform. The
> > > reservation should make it inaccessible for other things.
> > 
> > hpet_reserve_platform_timers() will give the HPET char driver a data
> > structure which specifies which drivers are reserved. In this manner,
> > they cannot be used by applications via file opens. The timer used by
> > the hardlockup detector should be marked as reserved.
> > 
> > Also, hpet_msi_capability_lookup() populates another data structure
> > which is used when obtaining an unused timer for a HPET clock event.
> > The timer used by the hardlockup detector should not be included in such
> > data structure.
> > 
> > Is this the explanation you would like to see? If yes, I will include it
> > in the changelog.
> 
> Yes, the explanation makes sense. The code still sucks. Not really your
> fault, but this is not making it any better.
> 
> What bothers me most is the fact that CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
> removes one HPET timer unconditionally. It neither checks whether the hpet
> watchdog is actually enabled on the command line, nor does it validate
> upfront whether the HPET supports FSB delivery.
> 
> That wastes an HPET timer unconditionally for no value. Not that I
> personally care much about /dev/hpet, but some older laptops depend on HPET
> per cpu timers as the local APIC timer stops in C2/3. So this unconditional
> reservation will cause regressions for no reason.
> 
> The proper approach here is to:
> 
>  1) Evaluate the command line _before_ hpet_enable() is invoked
> 
>  2) Check the availability of FSB delivery in hpet_enable()
> 
> Reserve an HPET channel for the watchdog only when #1 and #2 are true.

Sure. I will add the explanation in the message commit and only reserve
the timer if both of the conditions above are met.

Thanks and BR,
Ricardo


Re: [RFC PATCH v4 03/21] x86/hpet: Calculate ticks-per-second in a separate function

2019-06-18 Thread Ricardo Neri
On Fri, Jun 14, 2019 at 05:54:05PM +0200, Thomas Gleixner wrote:
> On Thu, 23 May 2019, Ricardo Neri wrote:
> >  
> > +u64 hpet_get_ticks_per_sec(u64 hpet_caps)
> > +{
> > +   u64 ticks_per_sec, period;
> > +
> > +   period = (hpet_caps & HPET_COUNTER_CLK_PERIOD_MASK) >>
> > +HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
> > +
> > +   /*
> > +* The frequency is the reciprocal of the period. The period is given
> > +* in femtoseconds per second. Thus, prepare a dividend to obtain the
> > +* frequency in ticks per second.
> > +*/
> > +
> > +   /* 10^15 femtoseconds per second */
> > +   ticks_per_sec = 1000ULL;
> > +   ticks_per_sec += period >> 1; /* round */
> > +
> > +   /* The quotient is put in the dividend. We drop the remainder. */
> > +   do_div(ticks_per_sec, period);
> > +
> > +   return ticks_per_sec;
> > +}
> > +
> >  int hpet_alloc(struct hpet_data *hdp)
> >  {
> > u64 cap, mcfg;
> > @@ -844,7 +867,6 @@ int hpet_alloc(struct hpet_data *hdp)
> > struct hpets *hpetp;
> > struct hpet __iomem *hpet;
> > static struct hpets *last;
> > -   unsigned long period;
> > unsigned long long temp;
> > u32 remainder;
> >  
> > @@ -894,12 +916,7 @@ int hpet_alloc(struct hpet_data *hdp)
> >  
> > last = hpetp;
> >  
> > -   period = (cap & HPET_COUNTER_CLK_PERIOD_MASK) >>
> > -   HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
> > -   temp = 1000uLL; /* 10^15 femtoseconds per second */
> > -   temp += period >> 1; /* round */
> > -   do_div(temp, period);
> > -   hpetp->hp_tick_freq = temp; /* ticks per second */
> > +   hpetp->hp_tick_freq = hpet_get_ticks_per_sec(cap);
> 
> Why are we actually computing this over and over?
> 
> In hpet_enable() which is the first function invoked we have:
> 
> /*
>  * The period is a femto seconds value. Convert it to a
>  * frequency.
>  */
> freq = FSEC_PER_SEC;
> do_div(freq, hpet_period);
> hpet_freq = freq;
> 
> So we already have ticks per second, aka frequency, right? So why do we
> need yet another function instead of using the value which is computed
> once? The frequency of the HPET channels has to be identical no matter
> what. If it's not HPET is broken beyond repair.

I don't think it needs to be recomputed again. I missed the fact that
the frequency was already computed here.

Also, the hpet char driver has its own frequency computation. Perhaps it
could also obtain it from here, right?

Thanks and BR,
Ricardo
> 
> Thanks,
> 
>   tglx
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 12/21] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs

2019-06-18 Thread Ricardo Neri
On Tue, Jun 11, 2019 at 10:11:04PM +0200, Thomas Gleixner wrote:
> On Thu, 23 May 2019, Ricardo Neri wrote:
> > @@ -52,10 +59,10 @@ static void kick_timer(struct hpet_hld_data *hdata, 
> > bool force)
> > return;
> >  
> > if (hdata->has_periodic)
> > -   period = watchdog_thresh * hdata->ticks_per_second;
> > +   period = watchdog_thresh * hdata->ticks_per_cpu;
> >  
> > count = hpet_readl(HPET_COUNTER);
> > -   new_compare = count + watchdog_thresh * hdata->ticks_per_second;
> > +   new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
> > hpet_set_comparator(hdata->num, (u32)new_compare, (u32)period);
> 
> So with this you might get close to the point where you trip over the SMI
> induced madness where CPUs vanish for several milliseconds in some value
> add code. You really want to do a read back of the hpet to detect that. See
> the comment in the hpet code. RHEL 7/8 allow up to 768 logical CPUs

Do you mean adding a readback to check if the new compare value is
greater than the current count? Similar to the check at the end of
hpet_next_event():

return res < HPET_MIN_CYCLES ? -ETIME : 0;

In such a case, should it try to set the comparator again? I think it
should, as otherwise the hardlockup detector would stop working.

Thanks and BR,
Ricardo
> 
> Thanks,
> 
>   tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 20/21] iommu/vt-d: hpet: Reserve an interrupt remampping table entry for watchdog

2019-06-18 Thread Ricardo Neri
On Mon, Jun 17, 2019 at 10:25:35AM +0200, Thomas Gleixner wrote:
> On Sun, 16 Jun 2019, Thomas Gleixner wrote:
> > On Thu, 23 May 2019, Ricardo Neri wrote:
> > > When the hardlockup detector is enabled, the function
> > > hld_hpet_intremapactivate_irq() activates the recently created entry
> > > in the interrupt remapping table via the modify_irte() functions. While
> > > doing this, it specifies which CPU the interrupt must target via its APIC
> > > ID. This function can be called every time the destination iD of the
> > > interrupt needs to be updated; there is no need to allocate or remove
> > > entries in the interrupt remapping table.
> > 
> > Brilliant.
> > 
> > > +int hld_hpet_intremap_activate_irq(struct hpet_hld_data *hdata)
> > > +{
> > > + u32 destid = apic->calc_dest_apicid(hdata->handling_cpu);
> > > + struct intel_ir_data *data;
> > > +
> > > + data = (struct intel_ir_data *)hdata->intremap_data;
> > > + data->irte_entry.dest_id = IRTE_DEST(destid);
> > > + return modify_irte(>irq_2_iommu, >irte_entry);
> > 
> > This calls modify_irte() which does at the very beginning:
> > 
> >raw_spin_lock_irqsave(_2_ir_lock, flags);
> > 
> > How is that supposed to work from NMI context? Not to talk about the
> > other spinlocks which are taken in the subsequent call chain.
> > 
> > You cannot call in any of that code from NMI context.
> > 
> > The only reason why this never deadlocked in your testing is that nothing
> > else touched that particular iommu where the HPET hangs off concurrently.
> > 
> > But that's just pure luck and not design. 
> 
> And just for the record. I warned you about that problem during the review
> of an earlier version and told you to talk to IOMMU folks whether there is
> a way to update the entry w/o running into that lock problem.

I think I misunderstood your feedback. You did mention issues on locking
between NMI and !NMI contexts. However, that was in the context of using the
generic irq code to do things such as set the affinity of the interrupt and
requesting an irq. I understood that I should instead program things directly.
I extrapolated this to the IOMMU driver in which I also added code directly
instead of using the existing layering.

Also, at the time, the question regarding the IOMMU, as I understood, was
whether it was posible to reserve a IOMMU remapping entry upfront. I believe
my patches achieve that, even if they are hacky and ugly, and have locking
issues. I see now that the locking issues are also part of the IOMMU
discussion. Perhaps that was also implicit.
> 
> Can you tell my why am I actually reviewing patches and spending time on
> this when the result is ignored anyway?

Yes, Thomas, I should have checked first with the IOMMU maintainers
first on the issues in the paragraph above. It is not my intention to
waste your time; your feedback has been valuable and has contributed to
improve the code.

> 
> I also tried to figure out why you went away from the IPI broadcast
> design. The only information I found is:
> 
> Changes vs. v1:
> 
>  * Brought back the round-robin mechanism proposed in v1 (this time not
>using the interrupt subsystem). This also requires to compute
>expiration times as in v1 (Andi Kleen, Stephane Eranian).
> 
> Great that there is no trace of any mail from Andi or Stephane about this
> on LKML. There is no problem with talking offlist about this stuff, but
> then you should at least provide a rationale for those who were not part of
> the private conversation.

Stephane has already commented the rationale.

Thanks and BR,

Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 20/21] iommu/vt-d: hpet: Reserve an interrupt remampping table entry for watchdog

2019-06-21 Thread Ricardo Neri
On Fri, Jun 21, 2019 at 10:05:01PM +0200, Thomas Gleixner wrote:
> On Fri, 21 Jun 2019, Jacob Pan wrote:
> > On Fri, 21 Jun 2019 10:31:26 -0700
> > Jacob Pan  wrote:
> > 
> > > On Fri, 21 Jun 2019 17:33:28 +0200 (CEST)
> > > Thomas Gleixner  wrote:
> > > 
> > > > On Wed, 19 Jun 2019, Jacob Pan wrote:  
> > > > > On Tue, 18 Jun 2019 01:08:06 +0200 (CEST)
> > > > > Thomas Gleixner  wrote:
> > > > > > 
> > > > > > Unless this problem is not solved and I doubt it can be solved
> > > > > > after talking to IOMMU people and studying manuals,
> > > > >
> > > > > I agree. modify irte might be done with cmpxchg_double() but the
> > > > > queued invalidation interface for IRTE cache flush is shared with
> > > > > DMA and requires holding a spinlock for enque descriptors, QI tail
> > > > > update etc.
> > > > > 
> > > > > Also, reserving & manipulating IRTE slot for hpet via backdoor
> > > > > might not be needed if the HPET PCI BDF (found in ACPI) can be
> > > > > utilized. But it might need more work to add a fake PCI device for
> > > > > HPET.
> > > > 
> > > > What would PCI/BDF solve?  
> > > I was thinking if HPET is a PCI device then it can naturally
> > > gain slots in IOMMU remapping table IRTEs via PCI MSI code. Then
> > > perhaps it can use the IRQ subsystem to set affinity etc. w/o
> > > directly adding additional helper functions in IRQ remapping code. I
> > > have not followed all the discussions, just a thought.
> > > 
> > I looked at the code again, seems the per cpu HPET code already taken
> > care of HPET MSI management. Why can't we use IR-HPET-MSI chip and
> > domain to allocate and set affinity etc.?
> > Most APIC timer has ARAT not enough per cpu HPET, so per cpu HPET is
> > not used mostly.
> 
> Sure, we can use that, but that does not allow to move the affinity from
> NMI context either. Same issue with the IOMMU as with the other hack.

If I understand Thomas' point correctly, the problem is having to take
lock in NMI context to update the IRTE for the HPET; both as in my hack
and in the generic irq code. The problem is worse when using the generic
irq code as there are several layers and several locks that need to be
handled.

Thanks and BR,
Ricardo


[RFC PATCH v4 15/21] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter

2019-05-23 Thread Ricardo Neri
Keep the HPET-based hardlockup detector disabled unless explicitly enabled
via a command-line argument. If such parameter is not given, the
initialization of the hpet-based hardlockup detector fails and the NMI
watchdog will fallback to use the perf-based implementation.

Given that __setup("nmi_watchdog=") is already used to control the behavior
of the NMI watchdog (via hardlockup_panic_setup()), it cannot be used to
control of the hpet-based implementation. Instead, use a new
early_param("nmi_watchdog").

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Peter Zijlstra 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Mimi Zohar 
Cc: Jan Kiszka 
Cc: Nick Desaulniers 
Cc: Masahiro Yamada 
Cc: Nayna Jain 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 

--
checkpatch gives the following warning:

CHECK: __setup appears un-documented -- check 
Documentation/admin-guide/kernel-parameters.rst
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);

This is a false-positive as the option nmi_watchdog is already
documented. The option is re-evaluated in this file as well.
---
 .../admin-guide/kernel-parameters.txt |  8 ++-
 arch/x86/kernel/watchdog_hld_hpet.c   | 22 +++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b2e2..17ed3dcda13e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2831,7 +2831,7 @@
Format: [state][,regs][,debounce][,die]
 
nmi_watchdog=   [KNL,BUGS=X86] Debugging features for SMP kernels
-   Format: [panic,][nopanic,][num]
+   Format: [panic,][nopanic,][num,][hpet]
Valid num: 0 or 1
0 - turn hardlockup detector in nmi_watchdog off
1 - turn hardlockup detector in nmi_watchdog on
@@ -2841,6 +2841,12 @@
please see 'nowatchdog'.
This is useful when you use a panic=... timeout and
need the box quickly up again.
+   When hpet is specified, the NMI watchdog will be driven
+   by an HPET timer, if available in the system. Otherwise,
+   it falls back to the default implementation (perf or
+   architecture-specific). Specifying hpet has no effect
+   if the NMI watchdog is not enabled (either at build time
+   or via the command line).
 
These settings can be accessed at runtime via
the nmi_watchdog and hardlockup_panic sysctls.
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index dcc50cd29374..76eed714a1cb 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -351,6 +351,28 @@ void hardlockup_detector_hpet_stop(void)
disable_timer(hld_data);
 }
 
+/**
+ * hardlockup_detector_hpet_setup() - Parse command-line parameters
+ * @str:   A string containing the kernel command line
+ *
+ * Parse the nmi_watchdog parameter from the kernel command line. If
+ * selected by the user, use this implementation to detect hardlockups.
+ */
+static int __init hardlockup_detector_hpet_setup(char *str)
+{
+   if (!str)
+   return -EINVAL;
+
+   if (parse_option_str(str, "hpet"))
+   hardlockup_use_hpet = true;
+
+   if (!nmi_watchdog_user_enabled && hardlockup_use_hpet)
+   pr_warn("Selecting HPET NMI watchdog has no effect with NMI 
watchdog disabled\n");
+
+   return 0;
+}
+early_param("nmi_watchdog", hardlockup_detector_hpet_setup);
+
 /**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 17/21] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable

2019-05-23 Thread Ricardo Neri
The HPET-based hardlockup detector relies on the TSC to determine if an
observed NMI interrupt was originated by HPET timer. Hence, this detector
can no longer be used with an unstable TSC.

In such case, permanently stop the HPET-based hardlockup detector and
start the perf-based detector.

Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h| 2 ++
 arch/x86/kernel/tsc.c  | 2 ++
 arch/x86/kernel/watchdog_hld.c | 7 +++
 3 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index fd99f2390714..a82cbe17479d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -128,6 +128,7 @@ extern int hardlockup_detector_hpet_init(void);
 extern void hardlockup_detector_hpet_stop(void);
 extern void hardlockup_detector_hpet_enable(unsigned int cpu);
 extern void hardlockup_detector_hpet_disable(unsigned int cpu);
+extern void hardlockup_detector_switch_to_perf(void);
 #else
 static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
 { return NULL; }
@@ -136,6 +137,7 @@ static inline int hardlockup_detector_hpet_init(void)
 static inline void hardlockup_detector_hpet_stop(void) {}
 static inline void hardlockup_detector_hpet_enable(unsigned int cpu) {}
 static inline void hardlockup_detector_hpet_disable(unsigned int cpu) {}
+static void harrdlockup_detector_switch_to_perf(void) {}
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #else /* CONFIG_HPET_TIMER */
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 59b57605e66c..b2210728ce3d 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1158,6 +1158,8 @@ void mark_tsc_unstable(char *reason)
 
clocksource_mark_unstable(_tsc_early);
clocksource_mark_unstable(_tsc);
+
+   hardlockup_detector_switch_to_perf();
 }
 
 EXPORT_SYMBOL_GPL(mark_tsc_unstable);
diff --git a/arch/x86/kernel/watchdog_hld.c b/arch/x86/kernel/watchdog_hld.c
index c2512d4c79c5..c8547c227a41 100644
--- a/arch/x86/kernel/watchdog_hld.c
+++ b/arch/x86/kernel/watchdog_hld.c
@@ -76,3 +76,10 @@ void watchdog_nmi_stop(void)
if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET)
hardlockup_detector_hpet_stop();
 }
+
+void hardlockup_detector_switch_to_perf(void)
+{
+   detector_type = X86_HARDLOCKUP_DETECTOR_PERF;
+   hardlockup_detector_hpet_stop();
+   hardlockup_start_all();
+}
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 19/21] iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode

2019-05-23 Thread Ricardo Neri
A recent change introduced a new member to struct irq_cfg to specify the
delivery mode of an interrupt. Supporting the configuration of the
delivery mode would require adding a third argument to prepare_irte().
Instead, simply take a pointer to a irq_cfg data structure as a the only
argument.

Internally, configure the delivery mode of the Interrupt Remapping Table
Entry as specified in the irq_cfg data structure and not as the APIC
setting.

This change does not change the existing behavior, as the delivery mode
of the APIC is used to configure irq_cfg data structure.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Jan Kiszka 
Cc: Lu Baolu 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 drivers/iommu/intel_irq_remapping.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 4160aa9f3f80..2e61eaca7d7e 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1072,7 +1072,7 @@ static int reenable_irq_remapping(int eim)
return -1;
 }
 
-static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
+static void prepare_irte(struct irte *irte, struct irq_cfg *irq_cfg)
 {
memset(irte, 0, sizeof(*irte));
 
@@ -1086,9 +1086,9 @@ static void prepare_irte(struct irte *irte, int vector, 
unsigned int dest)
 * irq migration in the presence of interrupt-remapping.
*/
irte->trigger_mode = 0;
-   irte->dlvry_mode = apic->irq_delivery_mode;
-   irte->vector = vector;
-   irte->dest_id = IRTE_DEST(dest);
+   irte->dlvry_mode = irq_cfg->delivery_mode;
+   irte->vector = irq_cfg->vector;
+   irte->dest_id = IRTE_DEST(irq_cfg->dest_apicid);
irte->redir_hint = 1;
 }
 
@@ -1265,7 +1265,7 @@ static void intel_irq_remapping_prepare_irte(struct 
intel_ir_data *data,
struct irte *irte = >irte_entry;
struct msi_msg *msg = >msi_entry;
 
-   prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
+   prepare_irte(irte, irq_cfg);
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC:
/* Set source-id of interrupt request */
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 03/21] x86/hpet: Calculate ticks-per-second in a separate function

2019-05-23 Thread Ricardo Neri
It is easier to compute the expiration times of an HPET timer by using
its frequency (i.e., the number of times it ticks in a second) than its
period, as given in the capabilities register.

In addition to the HPET char driver, the HPET-based hardlockup detector
will also need to know the timer's frequency. Thus, create a common
function that both can use.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 drivers/char/hpet.c  | 31 ---
 include/linux/hpet.h |  1 +
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 3a1e6b3ccd10..747255f552a9 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -836,6 +836,29 @@ static unsigned long hpet_calibrate(struct hpets *hpetp)
return ret;
 }
 
+u64 hpet_get_ticks_per_sec(u64 hpet_caps)
+{
+   u64 ticks_per_sec, period;
+
+   period = (hpet_caps & HPET_COUNTER_CLK_PERIOD_MASK) >>
+HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
+
+   /*
+* The frequency is the reciprocal of the period. The period is given
+* in femtoseconds per second. Thus, prepare a dividend to obtain the
+* frequency in ticks per second.
+*/
+
+   /* 10^15 femtoseconds per second */
+   ticks_per_sec = 1000ULL;
+   ticks_per_sec += period >> 1; /* round */
+
+   /* The quotient is put in the dividend. We drop the remainder. */
+   do_div(ticks_per_sec, period);
+
+   return ticks_per_sec;
+}
+
 int hpet_alloc(struct hpet_data *hdp)
 {
u64 cap, mcfg;
@@ -844,7 +867,6 @@ int hpet_alloc(struct hpet_data *hdp)
struct hpets *hpetp;
struct hpet __iomem *hpet;
static struct hpets *last;
-   unsigned long period;
unsigned long long temp;
u32 remainder;
 
@@ -894,12 +916,7 @@ int hpet_alloc(struct hpet_data *hdp)
 
last = hpetp;
 
-   period = (cap & HPET_COUNTER_CLK_PERIOD_MASK) >>
-   HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
-   temp = 1000uLL; /* 10^15 femtoseconds per second */
-   temp += period >> 1; /* round */
-   do_div(temp, period);
-   hpetp->hp_tick_freq = temp; /* ticks per second */
+   hpetp->hp_tick_freq = hpet_get_ticks_per_sec(cap);
 
printk(KERN_INFO "hpet%d: at MMIO 0x%lx, IRQ%s",
hpetp->hp_which, hdp->hd_phys_address,
diff --git a/include/linux/hpet.h b/include/linux/hpet.h
index 8604564b985d..e7b36bcf4699 100644
--- a/include/linux/hpet.h
+++ b/include/linux/hpet.h
@@ -107,5 +107,6 @@ static inline void hpet_reserve_timer(struct hpet_data *hd, 
int timer)
 }
 
 int hpet_alloc(struct hpet_data *);
+u64 hpet_get_ticks_per_sec(u64 hpet_caps);
 
 #endif /* !__HPET__ */
-- 
2.17.1



[RFC PATCH v4 18/21] x86/apic: Add a parameter for the APIC delivery mode

2019-05-23 Thread Ricardo Neri
Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. This new member, can
be used to update the configuration of the delivery mode in each interrupt
domain. Likewise, add equivalent macros to populate MSI messages.

Currently, all interrupt domains set the delivery mode of interrupts using
the APIC setting. Interrupt domains use an irq_cfg data structure to
configure their own data structures and hardware resources. Thus, in order
to keep the current behavior, set the delivery mode of the irq
configuration that as the APIC setting. In this manner, irq domains can
obtain the delivery mode from the irq configuration data instead of the
APIC setting, if needed.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Jan Kiszka 
Cc: Lu Baolu 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hw_irq.h |  5 +++--
 arch/x86/include/asm/msidef.h |  3 +++
 arch/x86/kernel/apic/vector.c | 10 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 32e666e1231e..c024e5976b78 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -117,8 +117,9 @@ struct irq_alloc_info {
 };
 
 struct irq_cfg {
-   unsigned intdest_apicid;
-   unsigned intvector;
+   unsigned intdest_apicid;
+   unsigned intvector;
+   enum ioapic_irq_destination_types   delivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index 38ccfdc2d96e..6d666c90f057 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -16,6 +16,9 @@
 MSI_DATA_VECTOR_MASK)
 
 #define MSI_DATA_DELIVERY_MODE_SHIFT   8
+#define MSI_DATA_DELIVERY_MODE_MASK0x0700
+#define MSI_DATA_DELIVERY_MODE(dm) (((dm) << MSI_DATA_DELIVERY_MODE_SHIFT) 
& \
+MSI_DATA_DELIVERY_MODE_MASK)
 #define  MSI_DATA_DELIVERY_FIXED   (0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI  (1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_NMI (4 << MSI_DATA_DELIVERY_MODE_SHIFT)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 3173e07d3791..99436fe7e932 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -548,6 +548,16 @@ static int x86_vector_alloc_irqs(struct irq_domain 
*domain, unsigned int virq,
irqd->chip_data = apicd;
irqd->hwirq = virq + i;
irqd_set_single_target(irqd);
+
+   /*
+* Initialize the delivery mode of this irq to match the
+* default delivery mode of the APIC. This is useful for
+* children irq domains which want to take the delivery
+* mode from the individual irq configuration rather
+* than from the APIC.
+*/
+apicd->hw_irq_cfg.delivery_mode = apic->irq_delivery_mode;
+
/*
 * Legacy vectors are already assigned when the IOAPIC
 * takes them over. They stay on the same vector. This is
-- 
2.17.1



[RFC PATCH v4 02/21] x86/hpet: Expose hpet_writel() in header

2019-05-23 Thread Ricardo Neri
In order to allow hpet_writel() to be used by other components (e.g.,
the HPET-based hardlockup detector) expose it in the HPET header file.

No empty definition is needed if CONFIG_HPET is not selected as all
existing callers select such config symbol.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h | 1 +
 arch/x86/kernel/hpet.c  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 67385d56d4f4..f132fbf984d4 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -72,6 +72,7 @@ extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
+extern void hpet_writel(unsigned int d, unsigned int a);
 extern void force_hpet_resume(void);
 
 struct irq_data;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index a0573f2e7763..5e86e024c489 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -62,7 +62,7 @@ inline unsigned int hpet_readl(unsigned int a)
return readl(hpet_virt_address + a);
 }
 
-static inline void hpet_writel(unsigned int d, unsigned int a)
+inline void hpet_writel(unsigned int d, unsigned int a)
 {
writel(d, hpet_virt_address + a);
 }
-- 
2.17.1



[RFC PATCH v4 06/21] x86/hpet: Configure the timer used by the hardlockup detector

2019-05-23 Thread Ricardo Neri
Implement the initial configuration of the timer to be used by the
hardlockup detector. Return a data structure with a description of the
timer; this information is subsequently used by the hardlockup detector.

Only provide the timer if it supports Front Side Bus interrupt delivery.
This condition greatly simplifies the implementation of the detector.
Specifically, it helps to avoid the complexities of routing the interrupt
via the IO-APIC (e.g., potential race conditions that arise from re-
programming the IO-APIC in NMI context).

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h | 13 +
 arch/x86/kernel/hpet.c  | 35 +++
 2 files changed, 48 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6f099e2781ce..20abdaa5372d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -109,6 +109,19 @@ extern void hpet_set_comparator(int num, unsigned int cmp, 
unsigned int period);
 
 #endif /* CONFIG_HPET_EMULATE_RTC */
 
+#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
+struct hpet_hld_data {
+   boolhas_periodic;
+   u32 num;
+   u64 ticks_per_second;
+};
+
+extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
+#else
+static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{ return NULL; }
+#endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
+
 #else /* CONFIG_HPET_TIMER */
 
 static inline int hpet_enable(void) { return 0; }
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index ff0250831786..5f9209949fc7 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -171,6 +171,41 @@ do {   
\
_hpet_print_config(__func__, __LINE__); \
 } while (0)
 
+#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
+struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{
+   struct hpet_hld_data *hdata;
+   u64 temp;
+   u32 cfg;
+
+   cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER_NR));
+
+   if (!(cfg & HPET_TN_FSB_CAP))
+   return NULL;
+
+   hdata = kzalloc(sizeof(*hdata), GFP_KERNEL);
+   if (!hdata)
+   return NULL;
+
+   if (cfg & HPET_TN_PERIODIC_CAP)
+   hdata->has_periodic = true;
+
+   hdata->num = HPET_WD_TIMER_NR;
+
+   cfg = hpet_readl(HPET_PERIOD);
+
+   /*
+* hpet_get_ticks_per_sec() expects the contents of the general
+* capabilities register. The period is in the 32 most significant
+* bits.
+*/
+   temp = (u64)cfg << HPET_COUNTER_CLK_PERIOD_SHIFT;
+   hdata->ticks_per_second = hpet_get_ticks_per_sec(temp);
+
+   return hdata;
+}
+#endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
+
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
  * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 20/21] iommu/vt-d: hpet: Reserve an interrupt remampping table entry for watchdog

2019-05-23 Thread Ricardo Neri
When interrupt remapping is enabled, MSI interrupt messages must follow a
special format that the IOMMU can understand. Hence, when the HPET hard
lockup detector is used with interrupt remapping, it must also follow this
special format.

The IOMMU, given the information about a particular interrupt, already
knows how to populate the MSI message with this special format and the
corresponding entry in the interrupt remapping table. Given that this is a
special interrupt case, we want to avoid the interrupt subsystem. Add two
functions to create an entry for the HPET hard lockup detector. Perform
this process in two steps as described below.

When initializing the lockup detector, the function
hld_hpet_intremap_alloc_irq() permanently allocates a new entry in the
interrupt remapping table and populates it with the information the
IOMMU driver needs. In order to populate the table, the IOMMU needs to
know the HPET block ID as described in the ACPI table. Hence, add such
ID to the data of the hardlockup detector.

When the hardlockup detector is enabled, the function
hld_hpet_intremapactivate_irq() activates the recently created entry
in the interrupt remapping table via the modify_irte() functions. While
doing this, it specifies which CPU the interrupt must target via its APIC
ID. This function can be called every time the destination iD of the
interrupt needs to be updated; there is no need to allocate or remove
entries in the interrupt remapping table.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Jan Kiszka 
Cc: Lu Baolu 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h | 11 +++
 arch/x86/kernel/hpet.c  |  1 +
 drivers/iommu/intel_irq_remapping.c | 49 +
 3 files changed, 61 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index a82cbe17479d..811051fa7ade 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -119,6 +119,8 @@ struct hpet_hld_data {
u64 tsc_ticks_per_cpu;
u32 handling_cpu;
u32 enabled_cpus;
+   u8  blockid;
+   void*intremap_data;
struct msi_msg  msi_msg;
unsigned long   cpu_monitored_mask[0];
 };
@@ -129,6 +131,15 @@ extern void hardlockup_detector_hpet_stop(void);
 extern void hardlockup_detector_hpet_enable(unsigned int cpu);
 extern void hardlockup_detector_hpet_disable(unsigned int cpu);
 extern void hardlockup_detector_switch_to_perf(void);
+#ifdef CONFIG_IRQ_REMAP
+extern int hld_hpet_intremap_activate_irq(struct hpet_hld_data *hdata);
+extern int hld_hpet_intremap_alloc_irq(struct hpet_hld_data *hdata);
+#else
+static inline int hld_hpet_intremap_activate_irq(struct hpet_hld_data *hdata)
+{ return -ENODEV; }
+static inline int hld_hpet_intremap_alloc_irq(struct hpet_hld_data *hdata)
+{ return -ENODEV; }
+#endif /* CONFIG_IRQ_REMAP */
 #else
 static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
 { return NULL; }
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index dd3bb664a188..ddc9be81a075 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -202,6 +202,7 @@ struct hpet_hld_data 
*hpet_hardlockup_detector_assign_timer(void)
 */
temp = (u64)cfg << HPET_COUNTER_CLK_PERIOD_SHIFT;
hdata->ticks_per_second = hpet_get_ticks_per_sec(temp);
+   hdata->blockid = hpet_blockid;
 
return hdata;
 }
diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 2e61eaca7d7e..256466dd30cb 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -1516,3 +1517,51 @@ int dmar_ir_hotplug(struct dmar_drhd_unit *dmaru, bool 
insert)
 
return ret;
 }
+
+#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
+int hld_hpet_intremap_activate_irq(struct hpet_hld_data *hdata)
+{
+   u32 destid = apic->calc_dest_apicid(hdata->handling_cpu);
+   struct intel_ir_data *data;
+
+   data = (struct intel_ir_data *)hdata->intremap_data;
+   data->irte_entry.dest_id = IRTE_DEST(destid);
+   return modify_irte(>irq_2_iommu, >irte_entry);
+}
+
+int hld_hpet_intremap_alloc_irq(struct hpet_hld_data *hdata)
+{
+   struct intel_ir_data *data;
+   struct irq_alloc_info info;
+   struct intel_iommu *iommu;
+   struct irq_cfg irq_cfg;
+   int index;
+
+   iommu = map_hpet_to_ir(hdata->blockid);
+   if (!iommu)
+ 

[RFC PATCH v4 10/21] watchdog/hardlockup: Add function to enable NMI watchdog on all allowed CPUs at once

2019-05-23 Thread Ricardo Neri
When there are more than one implementation of the NMI watchdog, there may
be situations in which switching from one to another is needed (e.g., if
the time-stamp counter becomes unstable, the HPET-based NMI watchdog can
no longer be used.

The perf-based implementation of the hardlockup detector makes use of
various per-CPU variables which are accessed via this_cpu operations.
Hence, each CPU needs to enable its own NMI watchdog if using the perf
implementation.

Add functionality to switch from one NMI watchdog to another and do it
from each allowed CPU.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h |  2 ++
 kernel/watchdog.c   | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e5f1a86e20b7..6d828334348b 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -83,9 +83,11 @@ static inline void reset_hung_task_detector(void) { }
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
+extern void hardlockup_start_all(void);
 extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
+static inline void hardlockup_start_all(void) {}
 #endif
 
 #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7f9e7b9306fe..be589001200a 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -566,6 +566,21 @@ int lockup_detector_offline_cpu(unsigned int cpu)
return 0;
 }
 
+static int hardlockup_start_fn(void *data)
+{
+   watchdog_nmi_enable(smp_processor_id());
+   return 0;
+}
+
+void hardlockup_start_all(void)
+{
+   int cpu;
+
+   cpumask_copy(_allowed_mask, _cpumask);
+   for_each_cpu(cpu, _allowed_mask)
+   smp_call_on_cpu(cpu, hardlockup_start_fn, NULL, false);
+}
+
 static void lockup_detector_reconfigure(void)
 {
cpus_read_lock();
-- 
2.17.1



[RFC PATCH v4 12/21] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs

2019-05-23 Thread Ricardo Neri
Each CPU should be monitored for hardlockups every watchdog_thresh seconds.
Since all the CPUs in the system are monitored by the same timer and the
timer interrupt is rotated among the monitored CPUs, the timer must expire
every watchdog_thresh/N seconds; where N is the number of monitored CPUs.
Use the new member of struct hld_data, ticks_per_cpu, to store the
aforementioned quantity.

The ticks-per-CPU quantity is updated every time the number of monitored
CPUs changes: when the watchdog is enabled or disabled for a specific CPU.
If the timer is used in periodic mode, it needs to be adjusted to reflect
the new expected expiration.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  1 +
 arch/x86/kernel/watchdog_hld_hpet.c | 46 +++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 31fc27508cf3..64acacce095d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -114,6 +114,7 @@ struct hpet_hld_data {
boolhas_periodic;
u32 num;
u64 ticks_per_second;
+   u64 ticks_per_cpu;
u32 handling_cpu;
u32 enabled_cpus;
struct msi_msg  msi_msg;
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index dff4dadabd4c..74aeb0535d08 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -45,6 +45,13 @@ static void kick_timer(struct hpet_hld_data *hdata, bool 
force)
 * are able to update the comparator before the counter reaches such new
 * value.
 *
+* Each CPU must be monitored every watch_thresh seconds. Since the
+* timer targets one CPU at a time, it must expire every
+*
+*ticks_per_cpu = watch_thresh * ticks_per_second /enabled_cpus
+*
+* as computed in update_ticks_per_cpu().
+*
 * Let it wrap around if needed.
 */
 
@@ -52,10 +59,10 @@ static void kick_timer(struct hpet_hld_data *hdata, bool 
force)
return;
 
if (hdata->has_periodic)
-   period = watchdog_thresh * hdata->ticks_per_second;
+   period = watchdog_thresh * hdata->ticks_per_cpu;
 
count = hpet_readl(HPET_COUNTER);
-   new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+   new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
hpet_set_comparator(hdata->num, (u32)new_compare, (u32)period);
 }
 
@@ -234,6 +241,27 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
return ret;
 }
 
+/**
+ * update_ticks_per_cpu() - Update the number of HPET ticks per CPU
+ * @hdata: struct with the timer's the ticks-per-second and CPU mask
+ *
+ * From the overall ticks-per-second of the timer, compute the number of ticks
+ * after which the timer should expire to monitor each CPU every watch_thresh
+ * seconds. The ticks-per-cpu quantity is computed using the number of CPUs 
that
+ * the watchdog currently monitors.
+ */
+static void update_ticks_per_cpu(struct hpet_hld_data *hdata)
+{
+   u64 temp = hdata->ticks_per_second;
+
+   /* Only update if there are monitored CPUs. */
+   if (!hdata->enabled_cpus)
+   return;
+
+   do_div(temp, hdata->enabled_cpus);
+   hdata->ticks_per_cpu = temp;
+}
+
 /**
  * hardlockup_detector_hpet_enable() - Enable the hardlockup detector
  * @cpu:   CPU Index in which the watchdog will be enabled.
@@ -246,13 +274,23 @@ void hardlockup_detector_hpet_enable(unsigned int cpu)
 {
cpumask_set_cpu(cpu, to_cpumask(hld_data->cpu_monitored_mask));
 
-   if (!hld_data->enabled_cpus++) {
+   hld_data->enabled_cpus++;
+   update_ticks_per_cpu(hld_data);
+
+   if (hld_data->enabled_cpus == 1) {
hld_data->handling_cpu = cpu;
update_msi_destid(hld_data);
/* Force timer kick when detector is just enabled */
kick_timer(hld_data, true);
enable_timer(hld_data);
  

[RFC PATCH v4 00/21] Implement an HPET-based hardlockup detector

2019-05-23 Thread Ricardo Neri
 of functions (Thomas Gleixner).
 * Added a new category of NMI handler, NMI_WATCHDOG, which executes after
   NMI_LOCAL handlers (Andi Kleen).
 * Updated handling of "nmi_watchdog" to support comma-separated
   arguments.
 * Undid split of the generic hardlockup detector into a separate file
   (Thomas Gleixner).
 * Added a new intermediate symbol CONFIG_HARDLOCKUP_DETECTOR_CORE to
   select generic parts of the detector (Paul E. McKenney,
   Thomas Gleixner).
 * Removed use of struct cpumask in favor of a variable length array in
   conjunction with kzalloc (Peter Zijlstra).
 * Added CPU as argument hardlockup_detector_hpet_enable()/disable()
   (Thomas Gleixner).
 * Remove unnecessary export of function declarations, flags and bit
   fields (Thomas Gleixner).
 * Removed  unnecessary check for FSB support when reserving timer for the
   detector (Thomas Gleixner).
 * Separated TSC code from HPET code in kick_timer() (Thomas Gleixner).
 * Reworked condition to check if the expected TSC value is within the
   error margin to avoid conditional (Peter Zijlstra).
 * Removed TSC error margin from struct hld_data; use global variable
   instead (Peter Zijlstra).
 * Removed previously introduced watchdog_get_allowed_cpumask*() and
   reworked hardlockup_detector_hpet_enable()/disable() to not need
   access to watchdog_allowed_mask (Thomas Gleixner).

Changes since v1:

 * Removed reads to HPET registers at every NMI. Instead use the time-stamp
   counter to infer the interrupt source (Thomas Gleixner, Andi Kleen).
 * Do not target CPUs in a round-robin manner. Instead, the HPET timer
   always targets the same CPU; other CPUs are monitored via an
   interprocessor interrupt.
 * Removed use of generic irq code to set interrupt affinity and NMI
   delivery. Instead, configure the interrupt directly in HPET registers
   (Thomas Gleixner).
 * Removed the proposed ops structure for NMI watchdogs. Instead, split
   the existing implementation into a generic library and perf-specific
   infrastructure (Thomas Gleixner, Nicholas Piggin).
 * Added an x86-specific shim hardlockup detector that selects between
   HPET and perf infrastructures as needed (Nicholas Piggin).
 * Removed locks taken in NMI and !NMI context. This was wrong and is no
   longer needed (Thomas Gleixner).
 * Fixed unconditonal return NMI_HANDLED when the HPET timer is programmed
   for FSB/MSI delivery (Peter Zijlstra).

References:

[1]. https://lkml.org/lkml/2018/6/12/1027
[2]. https://lkml.org/lkml/2019/2/27/402
[3]. https://lkml.org/lkml/2019/5/14/386

Ricardo Neri (21):
  x86/msi: Add definition for NMI delivery mode
  x86/hpet: Expose hpet_writel() in header
  x86/hpet: Calculate ticks-per-second in a separate function
  x86/hpet: Add hpet_set_comparator() for periodic and one-shot modes
  x86/hpet: Reserve timer for the HPET hardlockup detector
  x86/hpet: Configure the timer used by the hardlockup detector
  watchdog/hardlockup: Define a generic function to detect hardlockups
  watchdog/hardlockup: Decouple the hardlockup detector from perf
  x86/nmi: Add a NMI_WATCHDOG NMI handler category
  watchdog/hardlockup: Add function to enable NMI watchdog on all
allowed CPUs at once
  x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  watchdog/hardlockup/hpet: Adjust timer expiration on the number of
monitored CPUs
  x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog"
  watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot
parameter
  x86/watchdog: Add a shim hardlockup detector
  x86/tsc: Switch to perf-based hardlockup detector if TSC become
unstable
  x86/apic: Add a parameter for the APIC delivery mode
  iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode
  iommu/vt-d: hpet: Reserve an interrupt remampping table entry for
watchdog
  x86/watchdog/hardlockup/hpet: Support interrupt remapping

 .../admin-guide/kernel-parameters.txt |   8 +-
 arch/x86/Kconfig.debug|  15 +
 arch/x86/include/asm/hpet.h   |  47 ++
 arch/x86/include/asm/hw_irq.h |   5 +-
 arch/x86/include/asm/msidef.h |   4 +
 arch/x86/include/asm/nmi.h|   1 +
 arch/x86/kernel/Makefile  |   2 +
 arch/x86/kernel/apic/vector.c |  10 +
 arch/x86/kernel/hpet.c| 115 -
 arch/x86/kernel/nmi.c |  10 +
 arch/x86/kernel/tsc.c |   2 +
 arch/x86/kernel/watchdog_hld.c|  85 
 arch/x86/kernel/watchdog_hld_hpet.c   | 453 ++
 drivers/char/hpet.c   |  31 +-
 drivers/iommu/intel_irq_remapping.c   |  59 ++-
 include/linux/hpet.h  |   1 +
 include/linux/nmi.h   |   8 +-
 kernel/Makefile   |   2 +

[RFC PATCH v4 16/21] x86/watchdog: Add a shim hardlockup detector

2019-05-23 Thread Ricardo Neri
The generic hardlockup detector is based on perf. It also provides a set
of weak stubs that CPU architectures can override. Add a shim hardlockup
detector for x86 that selects between perf and hpet implementations.

Specifically, this shim implementation is needed for the HPET-based
hardlockup detector; it can also be used for future implementations.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Peter Zijlstra 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Mimi Zohar 
Cc: Jan Kiszka 
Cc: Nick Desaulniers 
Cc: Masahiro Yamada 
Cc: Nayna Jain 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Suggested-by: Nicholas Piggin 
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig.debug |  4 ++
 arch/x86/kernel/Makefile   |  1 +
 arch/x86/kernel/watchdog_hld.c | 78 ++
 3 files changed, 83 insertions(+)
 create mode 100644 arch/x86/kernel/watchdog_hld.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 445bbb188f10..52c77e2145c9 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -169,11 +169,15 @@ config IOMMU_LEAK
 config HAVE_MMIOTRACE_SUPPORT
def_bool y
 
+config X86_HARDLOCKUP_DETECTOR
+   bool
+
 config X86_HARDLOCKUP_DETECTOR_HPET
bool "Use HPET Timer for Hard Lockup Detection"
select SOFTLOCKUP_DETECTOR
select HARDLOCKUP_DETECTOR
select HARDLOCKUP_DETECTOR_CORE
+   select X86_HARDLOCKUP_DETECTOR
depends on HPET_TIMER && HPET && X86_64
help
  Say y to enable a hardlockup detector that is driven by a High-
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3ad55de67e8b..e60244b8a8ec 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_VM86)  += vm86_32.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
+obj-$(CONFIG_X86_HARDLOCKUP_DETECTOR) += watchdog_hld.o
 obj-$(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld_hpet.o
 obj-$(CONFIG_APB_TIMER)+= apb_timer.o
 
diff --git a/arch/x86/kernel/watchdog_hld.c b/arch/x86/kernel/watchdog_hld.c
new file mode 100644
index ..c2512d4c79c5
--- /dev/null
+++ b/arch/x86/kernel/watchdog_hld.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A shim hardlockup detector. It overrides the weak stubs of the generic
+ * implementation to select between the perf- or the hpet-based implementation.
+ *
+ * Copyright (C) Intel Corporation 2019
+ */
+
+#include 
+#include 
+
+enum x86_hardlockup_detector {
+   X86_HARDLOCKUP_DETECTOR_PERF,
+   X86_HARDLOCKUP_DETECTOR_HPET,
+};
+
+static enum __read_mostly x86_hardlockup_detector detector_type;
+
+int watchdog_nmi_enable(unsigned int cpu)
+{
+   if (detector_type == X86_HARDLOCKUP_DETECTOR_PERF) {
+   hardlockup_detector_perf_enable();
+   return 0;
+   }
+
+   if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET) {
+   hardlockup_detector_hpet_enable(cpu);
+   return 0;
+   }
+
+   return -ENODEV;
+}
+
+void watchdog_nmi_disable(unsigned int cpu)
+{
+   if (detector_type == X86_HARDLOCKUP_DETECTOR_PERF) {
+   hardlockup_detector_perf_disable();
+   return;
+   }
+
+   if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET) {
+   hardlockup_detector_hpet_disable(cpu);
+   return;
+   }
+}
+
+int __init watchdog_nmi_probe(void)
+{
+   int ret;
+
+   /*
+* Try first with the HPET hardlockup detector. It will only
+* succeed if selected at build time and the nmi_watchdog
+* command-line parameter is configured. This ensure that the
+* perf-based detector is used by default, if selected at
+* build time.
+*/
+   ret = hardlockup_detector_hpet_init();
+   if (!ret) {
+   detector_type = X86_HARDLOCKUP_DETECTOR_HPET;
+   return ret;
+   }
+
+   ret = hardlockup_detector_perf_init();
+   if (!ret) {
+   detector_type = X86_HARDLOCKUP_DETECTOR_PERF;
+   return ret;
+   }
+
+   return ret;
+}
+
+void watchdog_nmi_stop(void)
+{
+   /* Only the HPET lockup detector defines a stop function. */
+   if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET)
+   hardlockup_detector_hpet_stop();
+}
-- 
2.17.1



[RFC PATCH v4 11/21] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector

2019-05-23 Thread Ricardo Neri
This is the initial implementation of a hardlockup detector driven by an
HPET timer. This initial implementation includes functions to control the
timer via its registers. It also requests such timer, installs an NMI
interrupt handler and performs the initial configuration of the timer.

The detector is not functional at this stage. A subsequent changeset will
invoke the interfaces provides by this detector as well as functionality
to determine if the HPET timer caused the NMI.

In order to detect hardlockups in all the monitored CPUs, move the
interrupt to the next monitored CPU while handling the NMI interrupt; wrap
around when reaching the highest CPU in the mask. This rotation is
achieved by setting the affinity mask to only contain the next CPU to
monitor. A cpumask keeps track of all the CPUs that need to be monitored.
Such cpumask is updated when the watchdog is enabled or disabled in a
particular CPU.

This detector relies on an HPET timer that is capable of using Front Side
Bus interrupts. In order to avoid using the generic interrupt code,
program directly the MSI message register of the HPET timer.

HPET registers are only accessed to kick the timer after looking for
hardlockups. This happens every watchdog_thresh seconds. A subsequent
changeset will determine whether the HPET timer caused the interrupt based
on the value of the time-stamp counter. For now, just add a stub function.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Peter Zijlstra 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: Mimi Zohar 
Cc: Jan Kiszka 
Cc: Nick Desaulniers 
Cc: Masahiro Yamada 
Cc: Nayna Jain 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig.debug  |  11 +
 arch/x86/include/asm/hpet.h |  13 ++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/hpet.c  |   3 +-
 arch/x86/kernel/watchdog_hld_hpet.c | 335 
 5 files changed, 362 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/watchdog_hld_hpet.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index f730680dc818..445bbb188f10 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -169,6 +169,17 @@ config IOMMU_LEAK
 config HAVE_MMIOTRACE_SUPPORT
def_bool y
 
+config X86_HARDLOCKUP_DETECTOR_HPET
+   bool "Use HPET Timer for Hard Lockup Detection"
+   select SOFTLOCKUP_DETECTOR
+   select HARDLOCKUP_DETECTOR
+   select HARDLOCKUP_DETECTOR_CORE
+   depends on HPET_TIMER && HPET && X86_64
+   help
+ Say y to enable a hardlockup detector that is driven by a High-
+ Precision Event Timer. This option is helpful to not use counters
+ from the Performance Monitoring Unit to drive the detector.
+
 config X86_DECODER_SELFTEST
bool "x86 instruction decoder selftest"
depends on DEBUG_KERNEL && KPROBES
diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 20abdaa5372d..31fc27508cf3 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -114,12 +114,25 @@ struct hpet_hld_data {
boolhas_periodic;
u32 num;
u64 ticks_per_second;
+   u32 handling_cpu;
+   u32 enabled_cpus;
+   struct msi_msg  msi_msg;
+   unsigned long   cpu_monitored_mask[0];
 };
 
 extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
+extern int hardlockup_detector_hpet_init(void);
+extern void hardlockup_detector_hpet_stop(void);
+extern void hardlockup_detector_hpet_enable(unsigned int cpu);
+extern void hardlockup_detector_hpet_disable(unsigned int cpu);
 #else
 static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
 { return NULL; }
+static inline int hardlockup_detector_hpet_init(void)
+{ return -ENODEV; }
+static inline void hardlockup_detector_hpet_stop(void) {}
+static inline void hardlockup_detector_hpet_enable(unsigned int cpu) {}
+static inline void hardlockup_detector_hpet_disable(unsigned int cpu) {}
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #else /* CONFIG_HPET_TIMER */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3578ad248bc9..3ad55de67e8b 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_VM86)  += vm86_32.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
+obj-$(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld_hpet.o
 obj-$(CONFIG_APB_TIMER)+= apb_timer.o
 
 obj-$(CONFIG_AMD_NB)   += amd_nb.o
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 5f9209949fc7..dd3bb664a188 100644
--- a/arch

[RFC PATCH v4 13/21] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI

2019-05-23 Thread Ricardo Neri
The only direct method to determine whether an HPET timer caused an
interrupt is to read the Interrupt Status register. Unfortunately,
reading HPET registers is slow and, therefore, it is not recommended to
read them while in NMI context. Furthermore, status is not available if
the interrupt is generated vi the Front Side Bus.

An indirect manner to infer if the non-maskable interrupt we see was
caused by the HPET timer is to use the time-stamp counter. Compute the
value that the time-stamp counter should have at the next interrupt of the
HPET timer. Since the hardlockup detector operates in seconds, high
precision is not needed. This implementation considers that the HPET
caused the HMI if the time-stamp counter reads the expected value -/+ 1.5%.
This value is selected as it is equivalent to 1/64 and the division can be
performed using a bit shift operation. Experimentally, the error in the
estimation is consistently less than 1%.

The computation of the expected value of the time-stamp counter must be
performed in relation to watchdog_thresh divided by the number of
monitored CPUs. This quantity is stored in tsc_ticks_per_cpu and must be
updated whenever the number of monitored CPUs changes.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Peter Zijlstra 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Mimi Zohar 
Cc: Jan Kiszka 
Cc: Nick Desaulniers 
Cc: Masahiro Yamada 
Cc: Nayna Jain 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Suggested-by: Andi Kleen 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  2 ++
 arch/x86/kernel/watchdog_hld_hpet.c | 27 ++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 64acacce095d..fd99f2390714 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -115,6 +115,8 @@ struct hpet_hld_data {
u32 num;
u64 ticks_per_second;
u64 ticks_per_cpu;
+   u64 tsc_next;
+   u64 tsc_ticks_per_cpu;
u32 handling_cpu;
u32 enabled_cpus;
struct msi_msg  msi_msg;
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index 74aeb0535d08..dcc50cd29374 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -24,6 +24,7 @@
 
 static struct hpet_hld_data *hld_data;
 static bool hardlockup_use_hpet;
+static u64 tsc_next_error;
 
 /**
  * kick_timer() - Reprogram timer to expire in the future
@@ -33,11 +34,22 @@ static bool hardlockup_use_hpet;
  * Reprogram the timer to expire within watchdog_thresh seconds in the future.
  * If the timer supports periodic mode, it is not kicked unless @force is
  * true.
+ *
+ * Also, compute the expected value of the time-stamp counter at the time of
+ * expiration as well as a deviation from the expected value. The maximum
+ * deviation is of ~1.5%. This deviation can be easily computed by shifting
+ * by 6 positions the delta between the current and expected time-stamp values.
  */
 static void kick_timer(struct hpet_hld_data *hdata, bool force)
 {
+   u64 tsc_curr, tsc_delta, new_compare, count, period = 0;
bool kick_needed = force || !(hdata->has_periodic);
-   u64 new_compare, count, period = 0;
+
+   tsc_curr = rdtsc();
+
+   tsc_delta = (unsigned long)watchdog_thresh * hdata->tsc_ticks_per_cpu;
+   hdata->tsc_next = tsc_curr + tsc_delta;
+   tsc_next_error = tsc_delta >> 6;
 
/*
 * Update the comparator in increments of watch_thresh seconds relative
@@ -93,6 +105,15 @@ static void enable_timer(struct hpet_hld_data *hdata)
  */
 static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
 {
+   if (smp_processor_id() == hdata->handling_cpu) {
+   u64 tsc_curr;
+
+   tsc_curr = rdtsc();
+
+   return (tsc_curr - hdata->tsc_next) + tsc_next_error <
+  2 * tsc_next_error;
+   }
+
return false;
 }
 
@@ -260,6 +281,10 @@ static void update_ticks_per_cpu(struct hpet_hld_data 
*hdata)
 
do_div(temp, hdata->enabled_cpus);
hdata->ticks_per_cpu = temp;
+
+   temp = (unsigned long)tsc_khz * 1000L;
+   do_div(temp, hdata->enabled_cpus);
+   hdata->tsc_ticks_per_cpu = temp;
 }
 
 /**
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 14/21] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog"

2019-05-23 Thread Ricardo Neri
Prepare hardlockup_panic_setup() to handle a comma-separated list of
options. This is needed to pass options to specific implementations of the
hardlockup detector.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Peter Zijlstra 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Mimi Zohar 
Cc: Jan Kiszka 
Cc: Nick Desaulniers 
Cc: Masahiro Yamada 
Cc: Nayna Jain 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 kernel/watchdog.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index be589001200a..fd50049449ec 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -70,13 +70,13 @@ void __init hardlockup_detector_disable(void)
 
 static int __init hardlockup_panic_setup(char *str)
 {
-   if (!strncmp(str, "panic", 5))
+   if (parse_option_str(str, "panic"))
hardlockup_panic = 1;
-   else if (!strncmp(str, "nopanic", 7))
+   else if (parse_option_str(str, "nopanic"))
hardlockup_panic = 0;
-   else if (!strncmp(str, "0", 1))
+   else if (parse_option_str(str, "0"))
nmi_watchdog_user_enabled = 0;
-   else if (!strncmp(str, "1", 1))
+   else if (parse_option_str(str, "1"))
nmi_watchdog_user_enabled = 1;
return 1;
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 04/21] x86/hpet: Add hpet_set_comparator() for periodic and one-shot modes

2019-05-23 Thread Ricardo Neri
Instead of setting the timer period directly in hpet_set_periodic(), add a
new helper function hpet_set_comparator() that only sets the accumulator
and comparator. hpet_set_periodic() will only prepare the timer for
periodic mode and leave the expiration programming to
hpet_set_comparator().

This new function can also be used by other components (e.g., the HPET-
based hardlockup detector) which also need to configure HPET timers. Thus,
add its declaration into the hpet header file.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Originally-by: Suravee Suthikulpanit 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  1 +
 arch/x86/kernel/hpet.c  | 57 +
 2 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index f132fbf984d4..e7098740f5ee 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -102,6 +102,7 @@ extern int hpet_rtc_timer_init(void);
 extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id);
 extern int hpet_register_irq_handler(rtc_irq_handler handler);
 extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
+extern void hpet_set_comparator(int num, unsigned int cmp, unsigned int 
period);
 
 #endif /* CONFIG_HPET_EMULATE_RTC */
 
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 5e86e024c489..1723d55219e8 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -290,6 +290,47 @@ static void hpet_legacy_clockevent_register(void)
printk(KERN_DEBUG "hpet clockevent registered\n");
 }
 
+/**
+ * hpet_set_comparator() - Helper function for setting comparator register
+ * @num:   The timer ID
+ * @cmp:   The value to be written to the comparator/accumulator
+ * @period:The value to be written to the period (0 = oneshot mode)
+ *
+ * Helper function for updating comparator, accumulator and period values.
+ *
+ * In periodic mode, HPET needs HPET_TN_SETVAL to be set before writing
+ * to the Tn_CMP to update the accumulator. Then, HPET needs a second
+ * write (with HPET_TN_SETVAL cleared) to Tn_CMP to set the period.
+ * The HPET_TN_SETVAL bit is automatically cleared after the first write.
+ *
+ * For one-shot mode, HPET_TN_SETVAL does not need to be set.
+ *
+ * See the following documents:
+ *   - Intel IA-PC HPET (High Precision Event Timers) Specification
+ *   - AMD-8111 HyperTransport I/O Hub Data Sheet, Publication # 24674
+ */
+void hpet_set_comparator(int num, unsigned int cmp, unsigned int period)
+{
+   if (period) {
+   unsigned int v = hpet_readl(HPET_Tn_CFG(num));
+
+   hpet_writel(v | HPET_TN_SETVAL, HPET_Tn_CFG(num));
+   }
+
+   hpet_writel(cmp, HPET_Tn_CMP(num));
+
+   if (!period)
+   return;
+
+   /*
+* This delay is seldom used: never in one-shot mode and in periodic
+* only when reprogramming the timer.
+*/
+   udelay(1);
+   hpet_writel(period, HPET_Tn_CMP(num));
+}
+EXPORT_SYMBOL_GPL(hpet_set_comparator);
+
 static int hpet_set_periodic(struct clock_event_device *evt, int timer)
 {
unsigned int cfg, cmp, now;
@@ -301,19 +342,11 @@ static int hpet_set_periodic(struct clock_event_device 
*evt, int timer)
now = hpet_readl(HPET_COUNTER);
cmp = now + (unsigned int)delta;
cfg = hpet_readl(HPET_Tn_CFG(timer));
-   cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
-  HPET_TN_32BIT;
+   cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_32BIT;
hpet_writel(cfg, HPET_Tn_CFG(timer));
-   hpet_writel(cmp, HPET_Tn_CMP(timer));
-   udelay(1);
-   /*
-* HPET on AMD 81xx needs a second write (with HPET_TN_SETVAL
-* cleared) to T0_CMP to set the period. The HPET_TN_SETVAL
-* bit is automatically cleared after the first write.
-* (See AMD-8111 HyperTransport I/O Hub Data Sheet,
-* Publication # 24674)
-*/
-   hpet_writel((unsigned int)delta, HPET_Tn_CMP(timer));
+
+   hpet_set_comparator(timer, cmp, (unsigned int)delta);
+
hpet_start_counter();
hpet_print_config();
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 07/21] watchdog/hardlockup: Define a generic function to detect hardlockups

2019-05-23 Thread Ricardo Neri
The procedure to detect hardlockups is independent of the underlying
mechanism that generates the non-maskable interrupt used to drive the
detector. Thus, it can be put in a separate, generic function. In this
manner, it can be invoked by various implementations of the NMI watchdog.

For this purpose, move the bulk of watchdog_overflow_callback() to the
new function inspect_for_hardlockups(). This function can then be called
from the applicable NMI handlers.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: "Luis R. Rodriguez" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h   |  1 +
 kernel/watchdog_hld.c | 18 +++---
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 9003e29cde46..5a8b19749769 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -212,6 +212,7 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 void __user *, size_t *, loff_t *);
+void inspect_for_hardlockups(struct pt_regs *regs);
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #include 
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..b352e507b17f 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,14 +106,8 @@ static struct perf_event_attr wd_hw_attr = {
.disabled   = 1,
 };
 
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-  struct perf_sample_data *data,
-  struct pt_regs *regs)
+void inspect_for_hardlockups(struct pt_regs *regs)
 {
-   /* Ensure the watchdog never gets throttled */
-   event->hw.interrupts = 0;
-
if (__this_cpu_read(watchdog_nmi_touch) == true) {
__this_cpu_write(watchdog_nmi_touch, false);
return;
@@ -163,6 +157,16 @@ static void watchdog_overflow_callback(struct perf_event 
*event,
return;
 }
 
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+  struct perf_sample_data *data,
+  struct pt_regs *regs)
+{
+   /* Ensure the watchdog never gets throttled */
+   event->hw.interrupts = 0;
+   inspect_for_hardlockups(regs);
+}
+
 static int hardlockup_detector_event_create(void)
 {
unsigned int cpu = smp_processor_id();
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 08/21] watchdog/hardlockup: Decouple the hardlockup detector from perf

2019-05-23 Thread Ricardo Neri
The current default implementation of the hardlockup detector assumes that
it is implemented using perf events. However, the hardlockup detector can
be driven by other sources of non-maskable interrupts (e.g., a properly
configured timer).

Group and wrap in #ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF all the code
specific to perf: create and manage perf events, stop and start the perf-
based detector.

The generic portion of the detector (monitor the timers' thresholds, check
timestamps and detect hardlockups as well as the implementation of
arch_touch_nmi_watchdog()) is now selected with the new intermediate config
symbol CONFIG_HARDLOCKUP_DETECTOR_CORE.

The perf-based implementation of the detector selects the new intermediate
symbol. Other implementations should do the same.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h   |  5 -
 kernel/Makefile   |  2 +-
 kernel/watchdog_hld.c | 32 
 lib/Kconfig.debug |  4 
 4 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 5a8b19749769..e5f1a86e20b7 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -94,8 +94,11 @@ static inline void hardlockup_detector_disable(void) {}
 # define NMI_WATCHDOG_SYSCTL_PERM  0444
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_CORE)
 extern void arch_touch_nmi_watchdog(void);
+#endif
+
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_detector_perf_disable(void);
diff --git a/kernel/Makefile b/kernel/Makefile
index 33824f0385b3..d07d52a03cc9 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -83,7 +83,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_CORE) += watchdog_hld.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index b352e507b17f..bb6435978c46 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,12 +22,8 @@
 
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
-static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
-static DEFINE_PER_CPU(struct perf_event *, dead_event);
-static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 notrace void arch_touch_nmi_watchdog(void)
 {
@@ -98,14 +94,6 @@ static inline bool watchdog_check_timestamp(void)
 }
 #endif
 
-static struct perf_event_attr wd_hw_attr = {
-   .type   = PERF_TYPE_HARDWARE,
-   .config = PERF_COUNT_HW_CPU_CYCLES,
-   .size   = sizeof(struct perf_event_attr),
-   .pinned = 1,
-   .disabled   = 1,
-};
-
 void inspect_for_hardlockups(struct pt_regs *regs)
 {
if (__this_cpu_read(watchdog_nmi_touch) == true) {
@@ -157,6 +145,24 @@ void inspect_for_hardlockups(struct pt_regs *regs)
return;
 }
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
+#undef pr_fmt
+#define pr_fmt(fmt) "NMI perf watchdog: " fmt
+
+static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
+static struct cpumask dead_events_mask;
+
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
+
+static struct perf_event_attr wd_hw_attr = {
+   .type   = PERF_TYPE_HARDWARE,
+   .config = PERF_COUNT_HW_CPU_CYCLES,
+   .size   = sizeof(struct perf_event_attr),
+   .pinned = 1,
+   .disabled   = 1,
+};
+
 /* Callback function for perf event subsystem */
 static void watchdog_overflow_callback(struct perf_event *event,
   struct perf_sample_data *data,
@@ -298,3 +304,5 @@ i

[RFC PATCH v4 01/21] x86/msi: Add definition for NMI delivery mode

2019-05-23 Thread Ricardo Neri
Until now, the delivery mode of MSI interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Jan Kiszka 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/msidef.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8ccc32d0..38ccfdc2d96e 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -18,6 +18,7 @@
 #define MSI_DATA_DELIVERY_MODE_SHIFT   8
 #define  MSI_DATA_DELIVERY_FIXED   (0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI  (1 << MSI_DATA_DELIVERY_MODE_SHIFT)
+#define  MSI_DATA_DELIVERY_NMI (4 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_DATA_LEVEL_SHIFT   14
 #define MSI_DATA_LEVEL_DEASSERT(0 << MSI_DATA_LEVEL_SHIFT)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 09/21] x86/nmi: Add a NMI_WATCHDOG NMI handler category

2019-05-23 Thread Ricardo Neri
Add a NMI_WATCHDOG as a new category of NMI handler. This new category
is to be used with the HPET-based hardlockup detector. This detector
does not have a direct way of checking if the HPET timer is the source of
the NMI. Instead it indirectly estimate it using the time-stamp counter.

Therefore, we may have false-positives in case another NMI occurs within
the estimated time window. For this reason, we want the handler of the
detector to be called after all the NMI_LOCAL handlers. A simple way
of achieving this with a new NMI handler category.

Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/nmi.h |  1 +
 arch/x86/kernel/nmi.c  | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 75ded1d13d98..75aa98313cde 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -29,6 +29,7 @@ enum {
NMI_UNKNOWN,
NMI_SERR,
NMI_IO_CHECK,
+   NMI_WATCHDOG,
NMI_MAX
 };
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 4df7705022b9..43e96aedc6fe 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -64,6 +64,10 @@ static struct nmi_desc nmi_desc[NMI_MAX] =
.lock = __RAW_SPIN_LOCK_UNLOCKED(_desc[3].lock),
.head = LIST_HEAD_INIT(nmi_desc[3].head),
},
+   {
+   .lock = __RAW_SPIN_LOCK_UNLOCKED(_desc[4].lock),
+   .head = LIST_HEAD_INIT(nmi_desc[4].head),
+   },
 
 };
 
@@ -174,6 +178,8 @@ int __register_nmi_handler(unsigned int type, struct 
nmiaction *action)
 */
WARN_ON_ONCE(type == NMI_SERR && !list_empty(>head));
WARN_ON_ONCE(type == NMI_IO_CHECK && !list_empty(>head));
+   WARN_ON_ONCE(type == NMI_WATCHDOG && !list_empty(>head));
+
 
/*
 * some handlers need to be executed first otherwise a fake
@@ -384,6 +390,10 @@ static void default_do_nmi(struct pt_regs *regs)
}
raw_spin_unlock(_reason_lock);
 
+   handled = nmi_handle(NMI_WATCHDOG, regs);
+   if (handled == NMI_HANDLED)
+   return;
+
/*
 * Only one NMI can be latched at a time.  To handle
 * this we may process multiple nmi handlers at once to
-- 
2.17.1



[RFC PATCH v4 21/21] x86/watchdog/hardlockup/hpet: Support interrupt remapping

2019-05-23 Thread Ricardo Neri
When interrupt remapping is enabled in the system, the MSI interrupt
message must follow a special format the IOMMU can understand. Hence,
utilize the functionality provided by the IOMMU driver for such purpose.

The first step is to determine whether interrupt remapping is enabled
by looking for the existence of an interrupt remapping domain. If it
exists, let the IOMMU driver compose the MSI message for us. The hard-
lockup detector is still responsible of writing the message in the
HPET FSB route register.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Jan Kiszka 
Cc: Lu Baolu 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/watchdog_hld_hpet.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index 76eed714a1cb..a266439fdb9e 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct hpet_hld_data *hld_data;
@@ -117,6 +118,25 @@ static bool is_hpet_wdt_interrupt(struct hpet_hld_data 
*hdata)
return false;
 }
 
+/** irq_remapping_enabled() - Detect if interrupt remapping is enabled
+ * @hdata: A data structure with the HPET block id
+ *
+ * Determine if the HPET block that the hardlockup detector is under
+ * the remapped interrupt domain.
+ *
+ * Returns: True interrupt remapping is enabled. False otherwise.
+ */
+static bool irq_remapping_enabled(struct hpet_hld_data *hdata)
+{
+   struct irq_alloc_info info;
+
+   init_irq_alloc_info(, NULL);
+   info.type = X86_IRQ_ALLOC_TYPE_HPET;
+   info.hpet_id = hdata->blockid;
+
+   return !!irq_remapping_get_ir_irq_domain();
+}
+
 /**
  * compose_msi_msg() - Populate address and data fields of an MSI message
  * @hdata: A data strucure with the message to populate
@@ -161,6 +181,9 @@ static int update_msi_destid(struct hpet_hld_data *hdata)
 {
u32 destid;
 
+   if (irq_remapping_enabled(hdata))
+   return hld_hpet_intremap_activate_irq(hdata);
+
hdata->msi_msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
destid = apic->calc_dest_apicid(hdata->handling_cpu);
hdata->msi_msg.address_lo |= MSI_ADDR_DEST_ID(destid);
@@ -217,9 +240,17 @@ static int hardlockup_detector_nmi_handler(unsigned int 
type,
  */
 static int setup_irq_msi_mode(struct hpet_hld_data *hdata)
 {
+   s32 ret;
u32 v;
 
-   compose_msi_msg(hdata);
+   if (irq_remapping_enabled(hdata)) {
+   ret = hld_hpet_intremap_alloc_irq(hdata);
+   if (ret)
+   return ret;
+   } else {
+   compose_msi_msg(hdata);
+   }
+
hpet_writel(hdata->msi_msg.data, HPET_Tn_ROUTE(hdata->num));
hpet_writel(hdata->msi_msg.address_lo, HPET_Tn_ROUTE(hdata->num) + 4);
 
-- 
2.17.1



[RFC PATCH v4 05/21] x86/hpet: Reserve timer for the HPET hardlockup detector

2019-05-23 Thread Ricardo Neri
HPET timer 2 will be used to drive the HPET-based hardlockup detector.
Reserve such timer to ensure it cannot be used by user space programs or
for clock events.

When looking for MSI-capable timers for clock events, skip timer 2 if
the HPET hardlockup detector is selected.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  3 +++
 arch/x86/kernel/hpet.c  | 19 ---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index e7098740f5ee..6f099e2781ce 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -61,6 +61,9 @@
  */
 #define HPET_MIN_PERIOD10UL
 
+/* Timer used for the hardlockup detector */
+#define HPET_WD_TIMER_NR 2
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 1723d55219e8..ff0250831786 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -173,7 +173,8 @@ do {
\
 
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
- * timer 0 and timer 1 in case of RTC emulation.
+ * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
+ * the HPET-based hardlockup detector is used.
  */
 #ifdef CONFIG_HPET
 
@@ -183,7 +184,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
 {
struct hpet __iomem *hpet = hpet_virt_address;
struct hpet_timer __iomem *timer = >hpet_timers[2];
-   unsigned int nrtimers, i;
+   unsigned int nrtimers, i, start_timer;
struct hpet_data hd;
 
nrtimers = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT) + 1;
@@ -198,6 +199,13 @@ static void hpet_reserve_platform_timers(unsigned int id)
hpet_reserve_timer(, 1);
 #endif
 
+   if (IS_ENABLED(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET)) {
+   hpet_reserve_timer(, HPET_WD_TIMER_NR);
+   start_timer = HPET_WD_TIMER_NR + 1;
+   } else {
+   start_timer = HPET_WD_TIMER_NR;
+   }
+
/*
 * NOTE that hd_irq[] reflects IOAPIC input pins (LEGACY_8254
 * is wrong for i8259!) not the output IRQ.  Many BIOS writers
@@ -206,7 +214,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
hd.hd_irq[0] = HPET_LEGACY_8254;
hd.hd_irq[1] = HPET_LEGACY_RTC;
 
-   for (i = 2; i < nrtimers; timer++, i++) {
+   for (i = start_timer; i < nrtimers; timer++, i++) {
hd.hd_irq[i] = (readl(>hpet_config) &
Tn_INT_ROUTE_CNF_MASK) >> Tn_INT_ROUTE_CNF_SHIFT;
}
@@ -651,6 +659,11 @@ static void hpet_msi_capability_lookup(unsigned int 
start_timer)
struct hpet_dev *hdev = _devs[num_timers_used];
unsigned int cfg = hpet_readl(HPET_Tn_CFG(i));
 
+   /* Do not use timer reserved for the HPET watchdog. */
+   if (IS_ENABLED(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET) &&
+   i == HPET_WD_TIMER_NR)
+   continue;
+
/* Only consider HPET timer with MSI support */
if (!(cfg & HPET_TN_FSB_CAP))
continue;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 04/21] x86/hpet: Add hpet_set_comparator() for periodic and one-shot modes

2019-05-14 Thread Ricardo Neri
Instead of setting the timer period directly in hpet_set_periodic(), add a
new helper function hpet_set_comparator() that only sets the accumulator
and comparator. hpet_set_periodic() will only prepare the timer for
periodic mode and leave the expiration programming to
hpet_set_comparator().

This new function can also be used by other components (e.g., the HPET-
based hardlockup detector) which also need to configure HPET timers. Thus,
add its declaration into the hpet header file.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Originally-by: Suravee Suthikulpanit 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  1 +
 arch/x86/kernel/hpet.c  | 57 -
 2 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index f132fbf984d4..e7098740f5ee 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -102,6 +102,7 @@ extern int hpet_rtc_timer_init(void);
 extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id);
 extern int hpet_register_irq_handler(rtc_irq_handler handler);
 extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
+extern void hpet_set_comparator(int num, unsigned int cmp, unsigned int 
period);
 
 #endif /* CONFIG_HPET_EMULATE_RTC */
 
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 560fc28e1d13..c5c5fc150193 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -289,6 +289,46 @@ static void hpet_legacy_clockevent_register(void)
printk(KERN_DEBUG "hpet clockevent registered\n");
 }
 
+/**
+ * hpet_set_comparator() - Helper function for setting comparator register
+ * @num:   The timer ID
+ * @cmp:   The value to be written to the comparator/accumulator
+ * @period:The value to be written to the period (0 = oneshot mode)
+ *
+ * Helper function for updating comparator, accumulator and period values.
+ *
+ * In periodic mode, HPET needs HPET_TN_SETVAL to be set before writing
+ * to the Tn_CMP to update the accumulator. Then, HPET needs a second
+ * write (with HPET_TN_SETVAL cleared) to Tn_CMP to set the period.
+ * The HPET_TN_SETVAL bit is automatically cleared after the first write.
+ *
+ * For one-shot mode, HPET_TN_SETVAL does not need to be set.
+ *
+ * See the following documents:
+ *   - Intel IA-PC HPET (High Precision Event Timers) Specification
+ *   - AMD-8111 HyperTransport I/O Hub Data Sheet, Publication # 24674
+ */
+void hpet_set_comparator(int num, unsigned int cmp, unsigned int period)
+{
+   if (period) {
+   unsigned int v = hpet_readl(HPET_Tn_CFG(num));
+
+   hpet_writel(v | HPET_TN_SETVAL, HPET_Tn_CFG(num));
+   }
+
+   hpet_writel(cmp, HPET_Tn_CMP(num));
+
+   if (!period)
+   return;
+
+   /* This delay is seldom used: never in one-shot mode and in periodic
+* only when reprogramming the timer.
+*/
+   udelay(1);
+   hpet_writel(period, HPET_Tn_CMP(num));
+}
+EXPORT_SYMBOL_GPL(hpet_set_comparator);
+
 static int hpet_set_periodic(struct clock_event_device *evt, int timer)
 {
unsigned int cfg, cmp, now;
@@ -300,19 +340,10 @@ static int hpet_set_periodic(struct clock_event_device 
*evt, int timer)
now = hpet_readl(HPET_COUNTER);
cmp = now + (unsigned int)delta;
cfg = hpet_readl(HPET_Tn_CFG(timer));
-   cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
-  HPET_TN_32BIT;
-   hpet_writel(cfg, HPET_Tn_CFG(timer));
-   hpet_writel(cmp, HPET_Tn_CMP(timer));
-   udelay(1);
-   /*
-* HPET on AMD 81xx needs a second write (with HPET_TN_SETVAL
-* cleared) to T0_CMP to set the period. The HPET_TN_SETVAL
-* bit is automatically cleared after the first write.
-* (See AMD-8111 HyperTransport I/O Hub Data Sheet,
-* Publication # 24674)
-*/
-   hpet_writel((unsigned int)delta, HPET_Tn_CMP(timer));
+   cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_32BIT;
+
+   hpet_set_comparator(timer, cmp, (unsigned int)delta);
+
hpet_start_counter();
hpet_print_config();
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 06/21] x86/hpet: Configure the timer used by the hardlockup detector

2019-05-14 Thread Ricardo Neri
Implement the initial configuration of the timer to be used by the
hardlockup detector. Return a data structure with a description of the
timer; this information is subsequently used by the hardlockup detector.

Only provide the timer if it supports Front Side Bus interrupt delivery.
This condition greatly simplifies the implementation of the detector.
Specifically, it helps to avoid the complexities of routing the interrupt
via the IO-APIC (e.g., potential race conditions that arise from re-
programming the IO-APIC in NMI context).

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h | 13 +
 arch/x86/kernel/hpet.c  | 25 +
 2 files changed, 38 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6f099e2781ce..20abdaa5372d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -109,6 +109,19 @@ extern void hpet_set_comparator(int num, unsigned int cmp, 
unsigned int period);
 
 #endif /* CONFIG_HPET_EMULATE_RTC */
 
+#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
+struct hpet_hld_data {
+   boolhas_periodic;
+   u32 num;
+   u64 ticks_per_second;
+};
+
+extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
+#else
+static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{ return NULL; }
+#endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
+
 #else /* CONFIG_HPET_TIMER */
 
 static inline int hpet_enable(void) { return 0; }
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index ba0a5cc075d5..20a16a304f89 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -170,6 +170,31 @@ do {   
\
_hpet_print_config(__func__, __LINE__); \
 } while (0)
 
+#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
+struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{
+   struct hpet_hld_data *hdata;
+   unsigned int cfg;
+
+   cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER_NR));
+
+   if (!(cfg & HPET_TN_FSB_CAP))
+   return NULL;
+
+   hdata = kzalloc(sizeof(*hdata), GFP_KERNEL);
+   if (!hdata)
+   return NULL;
+
+   if (cfg & HPET_TN_PERIODIC_CAP)
+   hdata->has_periodic = true;
+
+   hdata->num = HPET_WD_TIMER_NR;
+   hdata->ticks_per_second = hpet_get_ticks_per_sec(hpet_readq(HPET_ID));
+
+   return hdata;
+}
+#endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
+
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
  * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 05/21] x86/hpet: Reserve timer for the HPET hardlockup detector

2019-05-14 Thread Ricardo Neri
HPET timer 2 will be used to drive the HPET-based hardlockup detector.
Reserve such timer to ensure it cannot be used by user space programs or
for clock events.

When looking for MSI-capable timers for clock events, skip timer 2 if
the HPET hardlockup detector is selected.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  3 +++
 arch/x86/kernel/hpet.c  | 19 ---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index e7098740f5ee..6f099e2781ce 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -61,6 +61,9 @@
  */
 #define HPET_MIN_PERIOD10UL
 
+/* Timer used for the hardlockup detector */
+#define HPET_WD_TIMER_NR 2
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index c5c5fc150193..ba0a5cc075d5 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -172,7 +172,8 @@ do {
\
 
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
- * timer 0 and timer 1 in case of RTC emulation.
+ * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
+ * the HPET-based hardlockup detector is used.
  */
 #ifdef CONFIG_HPET
 
@@ -182,7 +183,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
 {
struct hpet __iomem *hpet = hpet_virt_address;
struct hpet_timer __iomem *timer = >hpet_timers[2];
-   unsigned int nrtimers, i;
+   unsigned int nrtimers, i, start_timer;
struct hpet_data hd;
 
nrtimers = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT) + 1;
@@ -197,6 +198,13 @@ static void hpet_reserve_platform_timers(unsigned int id)
hpet_reserve_timer(, 1);
 #endif
 
+   if (IS_ENABLED(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET)) {
+   hpet_reserve_timer(, HPET_WD_TIMER_NR);
+   start_timer = HPET_WD_TIMER_NR + 1;
+   } else {
+   start_timer = HPET_WD_TIMER_NR;
+   }
+
/*
 * NOTE that hd_irq[] reflects IOAPIC input pins (LEGACY_8254
 * is wrong for i8259!) not the output IRQ.  Many BIOS writers
@@ -205,7 +213,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
hd.hd_irq[0] = HPET_LEGACY_8254;
hd.hd_irq[1] = HPET_LEGACY_RTC;
 
-   for (i = 2; i < nrtimers; timer++, i++) {
+   for (i = start_timer; i < nrtimers; timer++, i++) {
hd.hd_irq[i] = (readl(>hpet_config) &
Tn_INT_ROUTE_CNF_MASK) >> Tn_INT_ROUTE_CNF_SHIFT;
}
@@ -648,6 +656,11 @@ static void hpet_msi_capability_lookup(unsigned int 
start_timer)
struct hpet_dev *hdev = _devs[num_timers_used];
unsigned int cfg = hpet_readl(HPET_Tn_CFG(i));
 
+   /* Do not use timer reserved for the HPET watchdog. */
+   if (IS_ENABLED(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET) &&
+   i == HPET_WD_TIMER_NR)
+   continue;
+
/* Only consider HPET timer with MSI support */
if (!(cfg & HPET_TN_FSB_CAP))
continue;
-- 
2.17.1



[RFC PATCH v3 15/21] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter

2019-05-14 Thread Ricardo Neri
Keep the HPET-based hardlockup detector disabled unless explicitly enabled
via a command-line argument. If such parameter is not given, the
initialization of the hpet-based hardlockup detector fails and the NMI
watchdog will fallback to use the perf-based implementation.

Given that __setup("nmi_watchdog=") is already used to control the behavior
of the NMI watchdog (via hardlockup_panic_setup()), it cannot be used to
control of the hpet-based implementation. Instead, use a new
early_param("nmi_watchdog").

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Peter Zijlstra 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Mimi Zohar 
Cc: Jan Kiszka 
Cc: Nick Desaulniers 
Cc: Masahiro Yamada 
Cc: Nayna Jain 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 

--
checkpatch gives the following warning:

CHECK: __setup appears un-documented -- check 
Documentation/admin-guide/kernel-parameters.rst
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);

This is a false-positive as the option nmi_watchdog is already
documented. The option is re-evaluated in this file as well.
---
 .../admin-guide/kernel-parameters.txt |  8 ++-
 arch/x86/kernel/watchdog_hld_hpet.c   | 22 +++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index fd03e2b629bb..3c42205b469c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2801,7 +2801,7 @@
Format: [state][,regs][,debounce][,die]
 
nmi_watchdog=   [KNL,BUGS=X86] Debugging features for SMP kernels
-   Format: [panic,][nopanic,][num]
+   Format: [panic,][nopanic,][num,][hpet]
Valid num: 0 or 1
0 - turn hardlockup detector in nmi_watchdog off
1 - turn hardlockup detector in nmi_watchdog on
@@ -2811,6 +2811,12 @@
please see 'nowatchdog'.
This is useful when you use a panic=... timeout and
need the box quickly up again.
+   When hpet is specified, the NMI watchdog will be driven
+   by an HPET timer, if available in the system. Otherwise,
+   it falls back to the default implementation (perf or
+   architecture-specific). Specifying hpet has no effect
+   if the NMI watchdog is not enabled (either at build time
+   or via the command line).
 
These settings can be accessed at runtime via
the nmi_watchdog and hardlockup_panic sysctls.
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index 6f1f540cfee9..90680a8cf9fc 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -350,6 +350,28 @@ void hardlockup_detector_hpet_stop(void)
disable_timer(hld_data);
 }
 
+/**
+ * hardlockup_detector_hpet_setup() - Parse command-line parameters
+ * @str:   A string containing the kernel command line
+ *
+ * Parse the nmi_watchdog parameter from the kernel command line. If
+ * selected by the user, use this implementation to detect hardlockups.
+ */
+static int __init hardlockup_detector_hpet_setup(char *str)
+{
+   if (!str)
+   return -EINVAL;
+
+   if (parse_option_str(str, "hpet"))
+   hardlockup_use_hpet = true;
+
+   if (!nmi_watchdog_user_enabled && hardlockup_use_hpet)
+   pr_warn("Selecting HPET NMI watchdog has no effect with NMI 
watchdog disabled\n");
+
+   return 0;
+}
+early_param("nmi_watchdog", hardlockup_detector_hpet_setup);
+
 /**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
-- 
2.17.1



[RFC PATCH v3 02/21] x86/hpet: Expose hpet_writel() in header

2019-05-14 Thread Ricardo Neri
In order to allow hpet_writel() to be used by other components (e.g.,
the HPET-based hardlockup detector) expose it in the HPET header file.

No empty definition is needed if CONFIG_HPET is not selected as all
existing callers select such config symbol.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h | 1 +
 arch/x86/kernel/hpet.c  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 67385d56d4f4..f132fbf984d4 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -72,6 +72,7 @@ extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
+extern void hpet_writel(unsigned int d, unsigned int a);
 extern void force_hpet_resume(void);
 
 struct irq_data;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index fb32925a2e62..560fc28e1d13 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -61,7 +61,7 @@ inline unsigned int hpet_readl(unsigned int a)
return readl(hpet_virt_address + a);
 }
 
-static inline void hpet_writel(unsigned int d, unsigned int a)
+inline void hpet_writel(unsigned int d, unsigned int a)
 {
writel(d, hpet_virt_address + a);
 }
-- 
2.17.1



[RFC PATCH v3 01/21] x86/msi: Add definition for NMI delivery mode

2019-05-14 Thread Ricardo Neri
Until now, the delivery mode of MSI interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Dou Liyang 
Cc: Jan Kiszka 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/msidef.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8ccc32d0..38ccfdc2d96e 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -18,6 +18,7 @@
 #define MSI_DATA_DELIVERY_MODE_SHIFT   8
 #define  MSI_DATA_DELIVERY_FIXED   (0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI  (1 << MSI_DATA_DELIVERY_MODE_SHIFT)
+#define  MSI_DATA_DELIVERY_NMI (4 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_DATA_LEVEL_SHIFT   14
 #define MSI_DATA_LEVEL_DEASSERT(0 << MSI_DATA_LEVEL_SHIFT)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 07/21] watchdog/hardlockup: Define a generic function to detect hardlockups

2019-05-14 Thread Ricardo Neri
The procedure to detect hardlockups is independent of the underlying
mechanism that generates the non-maskable interrupt used to drive the
detector. Thus, it can be put in a separate, generic function. In this
manner, it can be invoked by various implementations of the NMI watchdog.

For this purpose, move the bulk of watchdog_overflow_callback() to the
new function inspect_for_hardlockups(). This function can then be called
from the applicable NMI handlers.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: "Luis R. Rodriguez" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h   |  1 +
 kernel/watchdog_hld.c | 18 +++---
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 9003e29cde46..5a8b19749769 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -212,6 +212,7 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 void __user *, size_t *, loff_t *);
+void inspect_for_hardlockups(struct pt_regs *regs);
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #include 
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..b352e507b17f 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,14 +106,8 @@ static struct perf_event_attr wd_hw_attr = {
.disabled   = 1,
 };
 
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-  struct perf_sample_data *data,
-  struct pt_regs *regs)
+void inspect_for_hardlockups(struct pt_regs *regs)
 {
-   /* Ensure the watchdog never gets throttled */
-   event->hw.interrupts = 0;
-
if (__this_cpu_read(watchdog_nmi_touch) == true) {
__this_cpu_write(watchdog_nmi_touch, false);
return;
@@ -163,6 +157,16 @@ static void watchdog_overflow_callback(struct perf_event 
*event,
return;
 }
 
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+  struct perf_sample_data *data,
+  struct pt_regs *regs)
+{
+   /* Ensure the watchdog never gets throttled */
+   event->hw.interrupts = 0;
+   inspect_for_hardlockups(regs);
+}
+
 static int hardlockup_detector_event_create(void)
 {
unsigned int cpu = smp_processor_id();
-- 
2.17.1



[RFC PATCH v3 17/21] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable

2019-05-14 Thread Ricardo Neri
The HPET-based hardlockup detector relies on the TSC to determine if an
observed NMI interrupt was originated by HPET timer. Hence, this detector
can no longer be used with an unstable TSC.

In such case, permanently stop the HPET-based hardlockup detector and
start the perf-based detector.

Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h| 2 ++
 arch/x86/kernel/tsc.c  | 2 ++
 arch/x86/kernel/watchdog_hld.c | 7 +++
 3 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index fd99f2390714..a82cbe17479d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -128,6 +128,7 @@ extern int hardlockup_detector_hpet_init(void);
 extern void hardlockup_detector_hpet_stop(void);
 extern void hardlockup_detector_hpet_enable(unsigned int cpu);
 extern void hardlockup_detector_hpet_disable(unsigned int cpu);
+extern void hardlockup_detector_switch_to_perf(void);
 #else
 static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
 { return NULL; }
@@ -136,6 +137,7 @@ static inline int hardlockup_detector_hpet_init(void)
 static inline void hardlockup_detector_hpet_stop(void) {}
 static inline void hardlockup_detector_hpet_enable(unsigned int cpu) {}
 static inline void hardlockup_detector_hpet_disable(unsigned int cpu) {}
+static void harrdlockup_detector_switch_to_perf(void) {}
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #else /* CONFIG_HPET_TIMER */
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 8f47c4862c56..5e4b6d219bec 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1157,6 +1157,8 @@ void mark_tsc_unstable(char *reason)
 
clocksource_mark_unstable(_tsc_early);
clocksource_mark_unstable(_tsc);
+
+   hardlockup_detector_switch_to_perf();
 }
 
 EXPORT_SYMBOL_GPL(mark_tsc_unstable);
diff --git a/arch/x86/kernel/watchdog_hld.c b/arch/x86/kernel/watchdog_hld.c
index c2512d4c79c5..c8547c227a41 100644
--- a/arch/x86/kernel/watchdog_hld.c
+++ b/arch/x86/kernel/watchdog_hld.c
@@ -76,3 +76,10 @@ void watchdog_nmi_stop(void)
if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET)
hardlockup_detector_hpet_stop();
 }
+
+void hardlockup_detector_switch_to_perf(void)
+{
+   detector_type = X86_HARDLOCKUP_DETECTOR_PERF;
+   hardlockup_detector_hpet_stop();
+   hardlockup_start_all();
+}
-- 
2.17.1



[RFC PATCH v3 19/21] iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode

2019-05-14 Thread Ricardo Neri
A recent change introduced a new member to struct irq_cfg to specify the
delivery mode of an interrupt. Supporting the configuration of the
delivery mode would require adding a third argument to prepare_irte().
Instead, simply take a pointer to a irq_cfg data structure as a the only
argument.

Internally, configure the delivery mode of the Interrupt Remapping Table
Entry as specified in the irq_cfg data structure and not as the APIC
setting.

This change does not change the existing behavior, as the delivery mode
of the APIC is used to configure irq_cfg data structure.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Dou Liyang 
Cc: Jan Kiszka 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 drivers/iommu/intel_irq_remapping.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 2d74641b7f7b..4ebf3af76589 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1073,7 +1073,7 @@ static int reenable_irq_remapping(int eim)
return -1;
 }
 
-static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
+static void prepare_irte(struct irte *irte, struct irq_cfg *irq_cfg)
 {
memset(irte, 0, sizeof(*irte));
 
@@ -1087,9 +1087,9 @@ static void prepare_irte(struct irte *irte, int vector, 
unsigned int dest)
 * irq migration in the presence of interrupt-remapping.
*/
irte->trigger_mode = 0;
-   irte->dlvry_mode = apic->irq_delivery_mode;
-   irte->vector = vector;
-   irte->dest_id = IRTE_DEST(dest);
+   irte->dlvry_mode = irq_cfg->delivery_mode;
+   irte->vector = irq_cfg->vector;
+   irte->dest_id = IRTE_DEST(irq_cfg->dest_apicid);
irte->redir_hint = 1;
 }
 
@@ -1266,7 +1266,7 @@ static void intel_irq_remapping_prepare_irte(struct 
intel_ir_data *data,
struct irte *irte = >irte_entry;
struct msi_msg *msg = >msi_entry;
 
-   prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
+   prepare_irte(irte, irq_cfg);
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC:
/* Set source-id of interrupt request */
-- 
2.17.1



[RFC PATCH v3 00/21] Implement an HPET-based hardlockup detector

2019-05-14 Thread Ricardo Neri
. McKenney,
   Thomas Gleixner).
 * Removed use of struct cpumask in favor of a variable length array in
   conjunction with kzalloc (Peter Zijlstra).
 * Added CPU as argument hardlockup_detector_hpet_enable()/disable()
   (Thomas Gleixner).
 * Remove unnecessary export of function declarations, flags and bit
   fields (Thomas Gleixner).
 * Removed  unnecessary check for FSB support when reserving timer for the
   detector (Thomas Gleixner).
 * Separated TSC code from HPET code in kick_timer() (Thomas Gleixner).
 * Reworked condition to check if the expected TSC value is within the
   error margin to avoid conditional (Peter Zijlstra).
 * Removed TSC error margin from struct hld_data; use global variable
   instead (Peter Zijlstra).
 * Removed previously introduced watchdog_get_allowed_cpumask*() and
   reworked hardlockup_detector_hpet_enable()/disable() to not need
   access to watchdog_allowed_mask (Thomas Gleixner).

Changes since v1:

 * Removed reads to HPET registers at every NMI. Instead use the time-stamp
   counter to infer the interrupt source (Thomas Gleixner, Andi Kleen).
 * Do not target CPUs in a round-robin manner. Instead, the HPET timer
   always targets the same CPU; other CPUs are monitored via an
   interprocessor interrupt.
 * Removed use of generic irq code to set interrupt affinity and NMI
   delivery. Instead, configure the interrupt directly in HPET registers
   (Thomas Gleixner).
 * Removed the proposed ops structure for NMI watchdogs. Instead, split
   the existing implementation into a generic library and perf-specific
   infrastructure (Thomas Gleixner, Nicholas Piggin).
 * Added an x86-specific shim hardlockup detector that selects between
   HPET and perf infrastructures as needed (Nicholas Piggin).
 * Removed locks taken in NMI and !NMI context. This was wrong and is no
   longer needed (Thomas Gleixner).
 * Fixed unconditonal return NMI_HANDLED when the HPET timer is programmed
   for FSB/MSI delivery (Peter Zijlstra).

References:

[1]. https://lkml.org/lkml/2018/6/12/1027
[2]. https://lkml.org/lkml/2019/2/27/402


Ricardo Neri (21):
  x86/msi: Add definition for NMI delivery mode
  x86/hpet: Expose hpet_writel() in header
  x86/hpet: Calculate ticks-per-second in a separate function
  x86/hpet: Add hpet_set_comparator() for periodic and one-shot modes
  x86/hpet: Reserve timer for the HPET hardlockup detector
  x86/hpet: Configure the timer used by the hardlockup detector
  watchdog/hardlockup: Define a generic function to detect hardlockups
  watchdog/hardlockup: Decouple the hardlockup detector from perf
  x86/nmi: Add a NMI_WATCHDOG NMI handler category
  watchdog/hardlockup: Add function to enable NMI watchdog on all
allowed CPUs at once
  x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  watchdog/hardlockup/hpet: Adjust timer expiration on the number of
monitored CPUs
  x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog"
  watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot
parameter
  x86/watchdog: Add a shim hardlockup detector
  x86/tsc: Switch to perf-based hardlockup detector if TSC become
unstable
  x86/apic: Add a parameter for the APIC delivery mode
  iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode
  iommu/vt-d: hpet: Reserve an interrupt remampping table entry for
watchdog
  x86/watchdog/hardlockup/hpet: Support interrupt remapping

 .../admin-guide/kernel-parameters.txt |   8 +-
 arch/x86/Kconfig.debug|  15 +
 arch/x86/include/asm/hpet.h   |  47 ++
 arch/x86/include/asm/hw_irq.h |   5 +-
 arch/x86/include/asm/msidef.h |   4 +
 arch/x86/include/asm/nmi.h|   1 +
 arch/x86/kernel/Makefile  |   2 +
 arch/x86/kernel/apic/vector.c |  10 +
 arch/x86/kernel/hpet.c| 105 +++-
 arch/x86/kernel/nmi.c |  10 +
 arch/x86/kernel/tsc.c |   2 +
 arch/x86/kernel/watchdog_hld.c|  85 
 arch/x86/kernel/watchdog_hld_hpet.c   | 452 ++
 drivers/char/hpet.c   |  31 +-
 drivers/iommu/intel_irq_remapping.c   |  59 ++-
 include/linux/hpet.h  |   1 +
 include/linux/nmi.h   |   8 +-
 kernel/Makefile   |   2 +-
 kernel/watchdog.c |  23 +-
 kernel/watchdog_hld.c |  50 +-
 lib/Kconfig.debug |   4 +
 21 files changed, 867 insertions(+), 57 deletions(-)
 create mode 100644 arch/x86/kernel/watchdog_hld.c
 create mode 100644 arch/x86/kernel/watchdog_hld_hpet.c

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.

[RFC PATCH v3 21/21] x86/watchdog/hardlockup/hpet: Support interrupt remapping

2019-05-14 Thread Ricardo Neri
When interrupt remapping is enabled in the system, the MSI interrupt
message must follow a special format the IOMMU can understand. Hence,
utilize the functionality provided by the IOMMU driver for such purpose.

The first step is to determine whether interrupt remapping is enabled
by looking for the existence of an interrupt remapping domain. If it
exists, let the IOMMU driver compose the MSI message for us. The hard-
lockup detector is still responsible of writing the message in the
HPET FSB route register.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Dou Liyang 
Cc: Jan Kiszka 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/watchdog_hld_hpet.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index 90680a8cf9fc..2d59b8f0390e 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct hpet_hld_data *hld_data;
@@ -116,6 +117,25 @@ static bool is_hpet_wdt_interrupt(struct hpet_hld_data 
*hdata)
return false;
 }
 
+/** irq_remapping_enabled() - Detect if interrupt remapping is enabled
+ * @hdata: A data structure with the HPET block id
+ *
+ * Determine if the HPET block that the hardlockup detector is under
+ * the remapped interrupt domain.
+ *
+ * Returns: True interrupt remapping is enabled. False otherwise.
+ */
+static bool irq_remapping_enabled(struct hpet_hld_data *hdata)
+{
+   struct irq_alloc_info info;
+
+   init_irq_alloc_info(, NULL);
+   info.type = X86_IRQ_ALLOC_TYPE_HPET;
+   info.hpet_id = hdata->blockid;
+
+   return !!irq_remapping_get_ir_irq_domain();
+}
+
 /**
  * compose_msi_msg() - Populate address and data fields of an MSI message
  * @hdata: A data strucure with the message to populate
@@ -160,6 +180,9 @@ static int update_msi_destid(struct hpet_hld_data *hdata)
 {
u32 destid;
 
+   if (irq_remapping_enabled(hdata))
+   return hld_hpet_intremap_activate_irq(hdata);
+
hdata->msi_msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
destid = apic->calc_dest_apicid(hdata->handling_cpu);
hdata->msi_msg.address_lo |= MSI_ADDR_DEST_ID(destid);
@@ -216,9 +239,17 @@ static int hardlockup_detector_nmi_handler(unsigned int 
type,
  */
 static int setup_irq_msi_mode(struct hpet_hld_data *hdata)
 {
+   s32 ret;
u32 v;
 
-   compose_msi_msg(hdata);
+   if (irq_remapping_enabled(hdata)) {
+   ret = hld_hpet_intremap_alloc_irq(hdata);
+   if (ret)
+   return ret;
+   } else {
+   compose_msi_msg(hdata);
+   }
+
hpet_writel(hdata->msi_msg.data, HPET_Tn_ROUTE(hdata->num));
hpet_writel(hdata->msi_msg.address_lo, HPET_Tn_ROUTE(hdata->num) + 4);
 
-- 
2.17.1



[RFC PATCH v3 11/21] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector

2019-05-14 Thread Ricardo Neri
This is the initial implementation of a hardlockup detector driven by an
HPET timer. This initial implementation includes functions to control the
timer via its registers. It also requests such timer, installs an NMI
interrupt handler and performs the initial configuration of the timer.

The detector is not functional at this stage. A subsequent changeset will
invoke the interfaces provides by this detector as well as functionality
to determine if the HPET timer caused the NMI.

In order to detect hardlockups in all the monitored CPUs, move the
interrupt to the next monitored CPU while handling the NMI interrupt; wrap
around when reaching the highest CPU in the mask. This rotation is
achieved by setting the affinity mask to only contain the next CPU to
monitor. A cpumask keeps track of all the CPUs that need to be monitored.
Such cpumask is updated when the watchdog is enabled or disabled in a
particular CPU.

This detector relies on an HPET timer that is capable of using Front Side
Bus interrupts. In order to avoid using the generic interrupt code,
program directly the MSI message register of the HPET timer.

HPET registers are only accessed to kick the timer after looking for
hardlockups. This happens every watchdog_thresh seconds. A subsequent
changeset will determine whether the HPET timer caused the interrupt based
on the value of the time-stamp counter. For now, just add a stub function.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Peter Zijlstra 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: Mimi Zohar 
Cc: Jan Kiszka 
Cc: Nick Desaulniers 
Cc: Masahiro Yamada 
Cc: Nayna Jain 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig.debug  |  11 +
 arch/x86/include/asm/hpet.h |  13 ++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/hpet.c  |   3 +-
 arch/x86/kernel/watchdog_hld_hpet.c | 334 
 5 files changed, 361 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/watchdog_hld_hpet.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 15d0fbe27872..376a5db81aec 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -169,6 +169,17 @@ config IOMMU_LEAK
 config HAVE_MMIOTRACE_SUPPORT
def_bool y
 
+config X86_HARDLOCKUP_DETECTOR_HPET
+   bool "Use HPET Timer for Hard Lockup Detection"
+   select SOFTLOCKUP_DETECTOR
+   select HARDLOCKUP_DETECTOR
+   select HARDLOCKUP_DETECTOR_CORE
+   depends on HPET_TIMER && HPET && X86_64
+   help
+ Say y to enable a hardlockup detector that is driven by an High-
+ Precision Event Timer. This option is helpful to not use counters
+ from the Performance Monitoring Unit to drive the detector.
+
 config X86_DECODER_SELFTEST
bool "x86 instruction decoder selftest"
depends on DEBUG_KERNEL && KPROBES
diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 20abdaa5372d..31fc27508cf3 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -114,12 +114,25 @@ struct hpet_hld_data {
boolhas_periodic;
u32 num;
u64 ticks_per_second;
+   u32 handling_cpu;
+   u32 enabled_cpus;
+   struct msi_msg  msi_msg;
+   unsigned long   cpu_monitored_mask[0];
 };
 
 extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
+extern int hardlockup_detector_hpet_init(void);
+extern void hardlockup_detector_hpet_stop(void);
+extern void hardlockup_detector_hpet_enable(unsigned int cpu);
+extern void hardlockup_detector_hpet_disable(unsigned int cpu);
 #else
 static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
 { return NULL; }
+static inline int hardlockup_detector_hpet_init(void)
+{ return -ENODEV; }
+static inline void hardlockup_detector_hpet_stop(void) {}
+static inline void hardlockup_detector_hpet_enable(unsigned int cpu) {}
+static inline void hardlockup_detector_hpet_disable(unsigned int cpu) {}
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #else /* CONFIG_HPET_TIMER */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 62e78a3fd31e..f9222769d84b 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_VM86)  += vm86_32.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
+obj-$(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld_hpet.o
 obj-$(CONFIG_APB_TIMER)+= apb_timer.o
 
 obj-$(CONFIG_AMD_NB)   += amd_nb.o
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 20a16a304f89..44459b36d333 100644
--- a/arch

[RFC PATCH v3 20/21] iommu/vt-d: hpet: Reserve an interrupt remampping table entry for watchdog

2019-05-14 Thread Ricardo Neri
When interrupt remapping is enabled, MSI interrupt messages must follow a
special format that the IOMMU can understand. Hence, when the HPET hard
lockup detector is used with interrupt remapping, it must also follow this
specia format.

The IOMMU, given the information about a particular interrupt, already
knows how to populate the MSI message with this special format and the
corresponding entry in the interrupt remapping table. Given that this is a
special interrupt case, we want to avoid the interrupt subsystem. Add two
functions to create an entry for the HPET hard lockup detector. Perform
this process in two steps as described below.

When initializing the lockup detector, the function
hld_hpet_intremap_alloc_irq() permanently allocates a new entry in the
interrupt remapping table and populates it with the information the
IOMMU driver needs. In order to populate the table, the IOMMU needs to
know the HPET block ID as described in the ACPI table. Hence, add such
ID to the data of the hardlockup detector.

When the hardlockup detector is enabled, the function
hld_hpet_intremapactivate_irq() activates the recently created entry
in the interrupt remapping table via the modify_irte() functions. While
doing this, it specifies which CPU the interrupt must target via its APIC
ID. This function can be called every time the destination iD of the
interrupt needs to be updated; there is no need to allocate or remove
entries in the interrupt remapping table.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Dou Liyang 
Cc: Jan Kiszka 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h | 11 +++
 arch/x86/kernel/hpet.c  |  1 +
 drivers/iommu/intel_irq_remapping.c | 49 +
 3 files changed, 61 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index a82cbe17479d..811051fa7ade 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -119,6 +119,8 @@ struct hpet_hld_data {
u64 tsc_ticks_per_cpu;
u32 handling_cpu;
u32 enabled_cpus;
+   u8  blockid;
+   void*intremap_data;
struct msi_msg  msi_msg;
unsigned long   cpu_monitored_mask[0];
 };
@@ -129,6 +131,15 @@ extern void hardlockup_detector_hpet_stop(void);
 extern void hardlockup_detector_hpet_enable(unsigned int cpu);
 extern void hardlockup_detector_hpet_disable(unsigned int cpu);
 extern void hardlockup_detector_switch_to_perf(void);
+#ifdef CONFIG_IRQ_REMAP
+extern int hld_hpet_intremap_activate_irq(struct hpet_hld_data *hdata);
+extern int hld_hpet_intremap_alloc_irq(struct hpet_hld_data *hdata);
+#else
+static inline int hld_hpet_intremap_activate_irq(struct hpet_hld_data *hdata)
+{ return -ENODEV; }
+static inline int hld_hpet_intremap_alloc_irq(struct hpet_hld_data *hdata)
+{ return -ENODEV; }
+#endif /* CONFIG_IRQ_REMAP */
 #else
 static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
 { return NULL; }
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 44459b36d333..d911a357e98f 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -191,6 +191,7 @@ struct hpet_hld_data 
*hpet_hardlockup_detector_assign_timer(void)
 
hdata->num = HPET_WD_TIMER_NR;
hdata->ticks_per_second = hpet_get_ticks_per_sec(hpet_readq(HPET_ID));
+   hdata->blockid = hpet_blockid;
 
return hdata;
 }
diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 4ebf3af76589..bfa58ef5e85c 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -1517,3 +1518,51 @@ int dmar_ir_hotplug(struct dmar_drhd_unit *dmaru, bool 
insert)
 
return ret;
 }
+
+#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
+int hld_hpet_intremap_activate_irq(struct hpet_hld_data *hdata)
+{
+   u32 destid = apic->calc_dest_apicid(hdata->handling_cpu);
+   struct intel_ir_data *data;
+
+   data = (struct intel_ir_data *)hdata->intremap_data;
+   data->irte_entry.dest_id = IRTE_DEST(destid);
+   return modify_irte(>irq_2_iommu, >irte_entry);
+}
+
+int hld_hpet_intremap_alloc_irq(struct hpet_hld_data *hdata)
+{
+   struct intel_ir_data *data;
+   struct irq_alloc_info info;
+   struct intel_iommu *iommu;
+   struct irq_cfg irq_cfg;
+   int index;
+
+   iommu = map_hpet_to_ir(hdata->blockid);
+   if (!iommu)
+   return

[RFC PATCH v3 12/21] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs

2019-05-14 Thread Ricardo Neri
Each CPU should be monitored for hardlockups every watchdog_thresh seconds.
Since all the CPUs in the system are monitored by the same timer and the
timer interrupt is rotated among the monitored CPUs, the timer must expire
every watchdog_thresh/N seconds; where N is the number of monitored CPUs.
Use the new member of struct hld_data, ticks_per_cpu, to store the
aforementioned quantity.

The ticks-per-CPU quantity is updated every time the number of monitored
CPUs changes: when the watchdog is enabled or disabled for a specific CPU.
If the timer is used in periodic mode, it needs to be adjusted to reflect
the new expected expiration.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  1 +
 arch/x86/kernel/watchdog_hld_hpet.c | 46 +++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 31fc27508cf3..64acacce095d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -114,6 +114,7 @@ struct hpet_hld_data {
boolhas_periodic;
u32 num;
u64 ticks_per_second;
+   u64 ticks_per_cpu;
u32 handling_cpu;
u32 enabled_cpus;
struct msi_msg  msi_msg;
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index c20b378b8c0c..9a3431a54616 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -44,6 +44,13 @@ static void kick_timer(struct hpet_hld_data *hdata, bool 
force)
 * are able to update the comparator before the counter reaches such new
 * value.
 *
+* Each CPU must be monitored every watch_thresh seconds. Since the
+* timer targets one CPU at a time, it must expire every
+*
+*ticks_per_cpu = watch_thresh * ticks_per_second /enabled_cpus
+*
+* as computed in update_ticks_per_cpu().
+*
 * Let it wrap around if needed.
 */
 
@@ -51,10 +58,10 @@ static void kick_timer(struct hpet_hld_data *hdata, bool 
force)
return;
 
if (hdata->has_periodic)
-   period = watchdog_thresh * hdata->ticks_per_second;
+   period = watchdog_thresh * hdata->ticks_per_cpu;
 
count = hpet_readl(HPET_COUNTER);
-   new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+   new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
hpet_set_comparator(hdata->num, (u32)new_compare, (u32)period);
 }
 
@@ -233,6 +240,27 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
return ret;
 }
 
+/**
+ * update_ticks_per_cpu() - Update the number of HPET ticks per CPU
+ * @hdata: struct with the timer's the ticks-per-second and CPU mask
+ *
+ * From the overall ticks-per-second of the timer, compute the number of ticks
+ * after which the timer should expire to monitor each CPU every watch_thresh
+ * seconds. The ticks-per-cpu quantity is computed using the number of CPUs 
that
+ * the watchdog currently monitors.
+ */
+static void update_ticks_per_cpu(struct hpet_hld_data *hdata)
+{
+   u64 temp = hdata->ticks_per_second;
+
+   /* Only update if there are monitored CPUs. */
+   if (!hdata->enabled_cpus)
+   return;
+
+   do_div(temp, hdata->enabled_cpus);
+   hdata->ticks_per_cpu = temp;
+}
+
 /**
  * hardlockup_detector_hpet_enable() - Enable the hardlockup detector
  * @cpu:   CPU Index in which the watchdog will be enabled.
@@ -245,13 +273,23 @@ void hardlockup_detector_hpet_enable(unsigned int cpu)
 {
cpumask_set_cpu(cpu, to_cpumask(hld_data->cpu_monitored_mask));
 
-   if (!hld_data->enabled_cpus++) {
+   hld_data->enabled_cpus++;
+   update_ticks_per_cpu(hld_data);
+
+   if (hld_data->enabled_cpus == 1) {
hld_data->handling_cpu = cpu;
update_msi_destid(hld_data);
/* Force timer kick when detector is just enabled */
kick_timer(hld_data, true);
enable_timer(hld_data);
  

[RFC PATCH v3 10/21] watchdog/hardlockup: Add function to enable NMI watchdog on all allowed CPUs at once

2019-05-14 Thread Ricardo Neri
When there are more than one implementation of the NMI watchdog, there may
be situations in which switching from one to another is needed (e.g., if
the time-stamp counter becomes unstable, the HPET-based NMI watchdog can
no longer be used.

The perf-based implementation of the hardlockup detector makes use of
various per-CPU variables which are accessed via this_cpu operations.
Hence, each CPU needs to enable its own NMI watchdog if using the perf
implementation.

Add functionality to switch from one NMI watchdog to another and do it
from each allowed CPU.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h |  2 ++
 kernel/watchdog.c   | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e5f1a86e20b7..6d828334348b 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -83,9 +83,11 @@ static inline void reset_hung_task_detector(void) { }
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
+extern void hardlockup_start_all(void);
 extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
+static inline void hardlockup_start_all(void) {}
 #endif
 
 #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7f9e7b9306fe..be589001200a 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -566,6 +566,21 @@ int lockup_detector_offline_cpu(unsigned int cpu)
return 0;
 }
 
+static int hardlockup_start_fn(void *data)
+{
+   watchdog_nmi_enable(smp_processor_id());
+   return 0;
+}
+
+void hardlockup_start_all(void)
+{
+   int cpu;
+
+   cpumask_copy(_allowed_mask, _cpumask);
+   for_each_cpu(cpu, _allowed_mask)
+   smp_call_on_cpu(cpu, hardlockup_start_fn, NULL, false);
+}
+
 static void lockup_detector_reconfigure(void)
 {
cpus_read_lock();
-- 
2.17.1



[RFC PATCH v3 08/21] watchdog/hardlockup: Decouple the hardlockup detector from perf

2019-05-14 Thread Ricardo Neri
The current default implementation of the hardlockup detector assumes that
it is implemented using perf events. However, the hardlockup detector can
be driven by other sources of non-maskable interrupts (e.g., a properly
configured timer).

Group and wrap in #ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF all the code
specific to perf: create and manage perf events, stop and start the perf-
based detector.

The generic portion of the detector (monitor the timers' thresholds, check
timestamps and detect hardlockups as well as the implementation of
arch_touch_nmi_watchdog()) is now selected with the new intermediate config
symbol CONFIG_HARDLOCKUP_DETECTOR_CORE.

The perf-based implementation of the detector selects the new intermediate
symbol. Other implementations should do the same.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h   |  5 -
 kernel/Makefile   |  2 +-
 kernel/watchdog_hld.c | 32 
 lib/Kconfig.debug |  4 
 4 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 5a8b19749769..e5f1a86e20b7 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -94,8 +94,11 @@ static inline void hardlockup_detector_disable(void) {}
 # define NMI_WATCHDOG_SYSCTL_PERM  0444
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_CORE)
 extern void arch_touch_nmi_watchdog(void);
+#endif
+
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_detector_perf_disable(void);
diff --git a/kernel/Makefile b/kernel/Makefile
index 62471e75a2b0..e9bdbaa1ed50 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -82,7 +82,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_CORE) += watchdog_hld.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index b352e507b17f..bb6435978c46 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,12 +22,8 @@
 
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
-static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
-static DEFINE_PER_CPU(struct perf_event *, dead_event);
-static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 notrace void arch_touch_nmi_watchdog(void)
 {
@@ -98,14 +94,6 @@ static inline bool watchdog_check_timestamp(void)
 }
 #endif
 
-static struct perf_event_attr wd_hw_attr = {
-   .type   = PERF_TYPE_HARDWARE,
-   .config = PERF_COUNT_HW_CPU_CYCLES,
-   .size   = sizeof(struct perf_event_attr),
-   .pinned = 1,
-   .disabled   = 1,
-};
-
 void inspect_for_hardlockups(struct pt_regs *regs)
 {
if (__this_cpu_read(watchdog_nmi_touch) == true) {
@@ -157,6 +145,24 @@ void inspect_for_hardlockups(struct pt_regs *regs)
return;
 }
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
+#undef pr_fmt
+#define pr_fmt(fmt) "NMI perf watchdog: " fmt
+
+static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
+static struct cpumask dead_events_mask;
+
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
+
+static struct perf_event_attr wd_hw_attr = {
+   .type   = PERF_TYPE_HARDWARE,
+   .config = PERF_COUNT_HW_CPU_CYCLES,
+   .size   = sizeof(struct perf_event_attr),
+   .pinned = 1,
+   .disabled   = 1,
+};
+
 /* Callback function for perf event subsystem */
 static void watchdog_overflow_callback(struct perf_event *event,
   struct perf_sample_data *data,
@@ -298,3 +304,5 @@ i

[RFC PATCH v3 18/21] x86/apic: Add a parameter for the APIC delivery mode

2019-05-14 Thread Ricardo Neri
Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. This new member, can
be used to update the configuration of the delivery mode in each interrupt
domain. Likewise, add equivalent macros to populate MSI messages.

Currently, all interrupt domains set the delivery mode of interrupts using
the APIC setting. Interrupt domains use an irq_cfg data structure to
configure their own data structures and hardware resources. Thus, in order
to keep the current behavior, set the delivery mode of the irq
configuration that as the APIC setting. In this manner, irq domains can
obtain the delivery mode from the irq configuration data instead of the
APIC setting, if needed.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Dou Liyang 
Cc: Jan Kiszka 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hw_irq.h |  5 +++--
 arch/x86/include/asm/msidef.h |  3 +++
 arch/x86/kernel/apic/vector.c | 10 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 32e666e1231e..c024e5976b78 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -117,8 +117,9 @@ struct irq_alloc_info {
 };
 
 struct irq_cfg {
-   unsigned intdest_apicid;
-   unsigned intvector;
+   unsigned intdest_apicid;
+   unsigned intvector;
+   enum ioapic_irq_destination_types   delivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index 38ccfdc2d96e..6d666c90f057 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -16,6 +16,9 @@
 MSI_DATA_VECTOR_MASK)
 
 #define MSI_DATA_DELIVERY_MODE_SHIFT   8
+#define MSI_DATA_DELIVERY_MODE_MASK0x0700
+#define MSI_DATA_DELIVERY_MODE(dm) (((dm) << MSI_DATA_DELIVERY_MODE_SHIFT) 
& \
+MSI_DATA_DELIVERY_MODE_MASK)
 #define  MSI_DATA_DELIVERY_FIXED   (0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI  (1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_NMI (4 << MSI_DATA_DELIVERY_MODE_SHIFT)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 3173e07d3791..99436fe7e932 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -548,6 +548,16 @@ static int x86_vector_alloc_irqs(struct irq_domain 
*domain, unsigned int virq,
irqd->chip_data = apicd;
irqd->hwirq = virq + i;
irqd_set_single_target(irqd);
+
+   /*
+* Initialize the delivery mode of this irq to match the
+* default delivery mode of the APIC. This is useful for
+* children irq domains which want to take the delivery
+* mode from the individual irq configuration rather
+* than from the APIC.
+*/
+apicd->hw_irq_cfg.delivery_mode = apic->irq_delivery_mode;
+
/*
 * Legacy vectors are already assigned when the IOAPIC
 * takes them over. They stay on the same vector. This is
-- 
2.17.1



[RFC PATCH v3 03/21] x86/hpet: Calculate ticks-per-second in a separate function

2019-05-14 Thread Ricardo Neri
It is easier to compute the expiration times of an HPET timer by using
its frequency (i.e., the number of times it ticks in a second) than its
period, as given in the capabilities register.

In addition to the HPET char driver, the HPET-based hardlockup detector
will also need to know the timer's frequency. Thus, create a common
function that both can use.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 drivers/char/hpet.c  | 31 ---
 include/linux/hpet.h |  1 +
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index d0ad85900b79..bdcbecfdb858 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -836,6 +836,29 @@ static unsigned long hpet_calibrate(struct hpets *hpetp)
return ret;
 }
 
+u64 hpet_get_ticks_per_sec(u64 hpet_caps)
+{
+   u64 ticks_per_sec, period;
+
+   period = (hpet_caps & HPET_COUNTER_CLK_PERIOD_MASK) >>
+HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
+
+   /*
+* The frequency is the reciprocal of the period. The period is given
+* femtoseconds per second. Thus, prepare a dividend to obtain the
+* frequency in ticks per second.
+*/
+
+   /* 10^15 femtoseconds per second */
+   ticks_per_sec = 1000uLL;
+   ticks_per_sec += period >> 1; /* round */
+
+   /* The quotient is put in the dividend. We drop the remainder. */
+   do_div(ticks_per_sec, period);
+
+   return ticks_per_sec;
+}
+
 int hpet_alloc(struct hpet_data *hdp)
 {
u64 cap, mcfg;
@@ -844,7 +867,6 @@ int hpet_alloc(struct hpet_data *hdp)
struct hpets *hpetp;
struct hpet __iomem *hpet;
static struct hpets *last;
-   unsigned long period;
unsigned long long temp;
u32 remainder;
 
@@ -894,12 +916,7 @@ int hpet_alloc(struct hpet_data *hdp)
 
last = hpetp;
 
-   period = (cap & HPET_COUNTER_CLK_PERIOD_MASK) >>
-   HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
-   temp = 1000uLL; /* 10^15 femtoseconds per second */
-   temp += period >> 1; /* round */
-   do_div(temp, period);
-   hpetp->hp_tick_freq = temp; /* ticks per second */
+   hpetp->hp_tick_freq = hpet_get_ticks_per_sec(cap);
 
printk(KERN_INFO "hpet%d: at MMIO 0x%lx, IRQ%s",
hpetp->hp_which, hdp->hd_phys_address,
diff --git a/include/linux/hpet.h b/include/linux/hpet.h
index 8604564b985d..e7b36bcf4699 100644
--- a/include/linux/hpet.h
+++ b/include/linux/hpet.h
@@ -107,5 +107,6 @@ static inline void hpet_reserve_timer(struct hpet_data *hd, 
int timer)
 }
 
 int hpet_alloc(struct hpet_data *);
+u64 hpet_get_ticks_per_sec(u64 hpet_caps);
 
 #endif /* !__HPET__ */
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 09/21] x86/nmi: Add a NMI_WATCHDOG NMI handler category

2019-05-14 Thread Ricardo Neri
Add a NMI_WATCHDOG as a new category of NMI handler. This new category
is to be used with the HPET-based hardlockup detector. This detector
does not have a direct way of checking if the HPET timer is the source of
the NMI. Instead it indirectly estimate it using the time-stamp counter.

Therefore, we may have false-positives in case another NMI occurs within
the estimated time window. For this reason, we want the handler of the
detector to be called after all the NMI_LOCAL handlers. A simple way
of achieving this with a new NMI handler category.

Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/nmi.h |  1 +
 arch/x86/kernel/nmi.c  | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 75ded1d13d98..75aa98313cde 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -29,6 +29,7 @@ enum {
NMI_UNKNOWN,
NMI_SERR,
NMI_IO_CHECK,
+   NMI_WATCHDOG,
NMI_MAX
 };
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 3755d0310026..a43213f0ab26 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -62,6 +62,10 @@ static struct nmi_desc nmi_desc[NMI_MAX] =
.lock = __RAW_SPIN_LOCK_UNLOCKED(_desc[3].lock),
.head = LIST_HEAD_INIT(nmi_desc[3].head),
},
+   {
+   .lock = __RAW_SPIN_LOCK_UNLOCKED(_desc[4].lock),
+   .head = LIST_HEAD_INIT(nmi_desc[4].head),
+   },
 
 };
 
@@ -172,6 +176,8 @@ int __register_nmi_handler(unsigned int type, struct 
nmiaction *action)
 */
WARN_ON_ONCE(type == NMI_SERR && !list_empty(>head));
WARN_ON_ONCE(type == NMI_IO_CHECK && !list_empty(>head));
+   WARN_ON_ONCE(type == NMI_WATCHDOG && !list_empty(>head));
+
 
/*
 * some handlers need to be executed first otherwise a fake
@@ -382,6 +388,10 @@ static void default_do_nmi(struct pt_regs *regs)
}
raw_spin_unlock(_reason_lock);
 
+   handled = nmi_handle(NMI_WATCHDOG, regs);
+   if (handled == NMI_HANDLED)
+   return;
+
/*
 * Only one NMI can be latched at a time.  To handle
 * this we may process multiple nmi handlers at once to
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 14/21] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog"

2019-05-14 Thread Ricardo Neri
Prepare hardlockup_panic_setup() to handle a comma-separated list of
options. This is needed to pass options to specific implementations of the
hardlockup detector.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Peter Zijlstra 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Mimi Zohar 
Cc: Jan Kiszka 
Cc: Nick Desaulniers 
Cc: Masahiro Yamada 
Cc: Nayna Jain 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 kernel/watchdog.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index be589001200a..fd50049449ec 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -70,13 +70,13 @@ void __init hardlockup_detector_disable(void)
 
 static int __init hardlockup_panic_setup(char *str)
 {
-   if (!strncmp(str, "panic", 5))
+   if (parse_option_str(str, "panic"))
hardlockup_panic = 1;
-   else if (!strncmp(str, "nopanic", 7))
+   else if (parse_option_str(str, "nopanic"))
hardlockup_panic = 0;
-   else if (!strncmp(str, "0", 1))
+   else if (parse_option_str(str, "0"))
nmi_watchdog_user_enabled = 0;
-   else if (!strncmp(str, "1", 1))
+   else if (parse_option_str(str, "1"))
nmi_watchdog_user_enabled = 1;
return 1;
 }
-- 
2.17.1



[RFC PATCH v3 13/21] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI

2019-05-14 Thread Ricardo Neri
The only direct method to determine whether an HPET timer caused an
interrupt is to read the Interrupt Status register. Unfortunately,
reading HPET registers is slow and, therefore, it is not recommended to
read them while in NMI context. Furthermore, status is not available if
the interrupt is generated vi the Front Side Bus.

An indirect manner to infer if the non-maskable interrupt we see was
caused by the HPET timer is to use the time-stamp counter. Compute the
value that the time-stamp counter should have at the next interrupt of the
HPET timer. Since the hardlockup detector operates in seconds, high
precision is not needed. This implementation considers that the HPET
caused the HMI if the time-stamp counter reads the expected value -/+ 1.5%.
This value is selected as it is equivalent to 1/64 and the division can be
performed using a bit shift operation. Experimentally, the error in the
estimation is consistently less than 1%.

The computation of the expected value of the time-stamp counter must be
performed in relation to watchdog_thresh divided by the number of
monitored CPUs. This quantity is stored in tsc_ticks_per_cpu and must be
updated whenever the number of monitored CPUs changes.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Peter Zijlstra 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Mimi Zohar 
Cc: Jan Kiszka 
Cc: Nick Desaulniers 
Cc: Masahiro Yamada 
Cc: Nayna Jain 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Suggested-by: Andi Kleen 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  2 ++
 arch/x86/kernel/watchdog_hld_hpet.c | 27 ++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 64acacce095d..fd99f2390714 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -115,6 +115,8 @@ struct hpet_hld_data {
u32 num;
u64 ticks_per_second;
u64 ticks_per_cpu;
+   u64 tsc_next;
+   u64 tsc_ticks_per_cpu;
u32 handling_cpu;
u32 enabled_cpus;
struct msi_msg  msi_msg;
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index 9a3431a54616..6f1f540cfee9 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -23,6 +23,7 @@
 
 static struct hpet_hld_data *hld_data;
 static bool hardlockup_use_hpet;
+static u64 tsc_next_error;
 
 /**
  * kick_timer() - Reprogram timer to expire in the future
@@ -32,11 +33,22 @@ static bool hardlockup_use_hpet;
  * Reprogram the timer to expire within watchdog_thresh seconds in the future.
  * If the timer supports periodic mode, it is not kicked unless @force is
  * true.
+ *
+ * Also, compute the expected value of the time-stamp counter at the time of
+ * expiration as well as a deviation from the expected value. The maximum
+ * deviation is of ~1.5%. This deviation can be easily computed by shifting
+ * by 6 positions the delta between the current and expected time-stamp values.
  */
 static void kick_timer(struct hpet_hld_data *hdata, bool force)
 {
+   u64 tsc_curr, tsc_delta, new_compare, count, period = 0;
bool kick_needed = force || !(hdata->has_periodic);
-   u64 new_compare, count, period = 0;
+
+   tsc_curr = rdtsc();
+
+   tsc_delta = (unsigned long)watchdog_thresh * hdata->tsc_ticks_per_cpu;
+   hdata->tsc_next = tsc_curr + tsc_delta;
+   tsc_next_error = tsc_delta >> 6;
 
/*
 * Update the comparator in increments of watch_thresh seconds relative
@@ -92,6 +104,15 @@ static void enable_timer(struct hpet_hld_data *hdata)
  */
 static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
 {
+   if (smp_processor_id() == hdata->handling_cpu) {
+   u64 tsc_curr;
+
+   tsc_curr = rdtsc();
+
+   return (tsc_curr - hdata->tsc_next) + tsc_next_error <
+  2 * tsc_next_error;
+   }
+
return false;
 }
 
@@ -259,6 +280,10 @@ static void update_ticks_per_cpu(struct hpet_hld_data 
*hdata)
 
do_div(temp, hdata->enabled_cpus);
hdata->ticks_per_cpu = temp;
+
+   temp = (unsigned long)tsc_khz * 1000L;
+   do_div(temp, hdata->enabled_cpus);
+   hdata->tsc_ticks_per_cpu = temp;
 }
 
 /**
-- 
2.17.1



[RFC PATCH v3 16/21] x86/watchdog: Add a shim hardlockup detector

2019-05-14 Thread Ricardo Neri
The generic hardlockup detector is based on perf. It also provides a set
of weak stubs that CPU architectures can override. Add a shim hardlockup
detector for x86 that selects between perf and hpet implementations.

Specifically, this shim implementation is needed for the HPET-based
hardlockup detector; it can also be used for future implementations.

Cc: "H. Peter Anvin" 
Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Peter Zijlstra 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: Mimi Zohar 
Cc: Jan Kiszka 
Cc: Nick Desaulniers 
Cc: Masahiro Yamada 
Cc: Nayna Jain 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Suggested-by: Nicholas Piggin 
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig.debug |  4 ++
 arch/x86/kernel/Makefile   |  1 +
 arch/x86/kernel/watchdog_hld.c | 78 ++
 3 files changed, 83 insertions(+)
 create mode 100644 arch/x86/kernel/watchdog_hld.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 376a5db81aec..0d9e11eb070c 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -169,11 +169,15 @@ config IOMMU_LEAK
 config HAVE_MMIOTRACE_SUPPORT
def_bool y
 
+config X86_HARDLOCKUP_DETECTOR
+   bool
+
 config X86_HARDLOCKUP_DETECTOR_HPET
bool "Use HPET Timer for Hard Lockup Detection"
select SOFTLOCKUP_DETECTOR
select HARDLOCKUP_DETECTOR
select HARDLOCKUP_DETECTOR_CORE
+   select X86_HARDLOCKUP_DETECTOR
depends on HPET_TIMER && HPET && X86_64
help
  Say y to enable a hardlockup detector that is driven by an High-
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index f9222769d84b..f89a259931f7 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_VM86)  += vm86_32.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
+obj-$(CONFIG_X86_HARDLOCKUP_DETECTOR) += watchdog_hld.o
 obj-$(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld_hpet.o
 obj-$(CONFIG_APB_TIMER)+= apb_timer.o
 
diff --git a/arch/x86/kernel/watchdog_hld.c b/arch/x86/kernel/watchdog_hld.c
new file mode 100644
index ..c2512d4c79c5
--- /dev/null
+++ b/arch/x86/kernel/watchdog_hld.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A shim hardlockup detector. It overrides the weak stubs of the generic
+ * implementation to select between the perf- or the hpet-based implementation.
+ *
+ * Copyright (C) Intel Corporation 2019
+ */
+
+#include 
+#include 
+
+enum x86_hardlockup_detector {
+   X86_HARDLOCKUP_DETECTOR_PERF,
+   X86_HARDLOCKUP_DETECTOR_HPET,
+};
+
+static enum __read_mostly x86_hardlockup_detector detector_type;
+
+int watchdog_nmi_enable(unsigned int cpu)
+{
+   if (detector_type == X86_HARDLOCKUP_DETECTOR_PERF) {
+   hardlockup_detector_perf_enable();
+   return 0;
+   }
+
+   if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET) {
+   hardlockup_detector_hpet_enable(cpu);
+   return 0;
+   }
+
+   return -ENODEV;
+}
+
+void watchdog_nmi_disable(unsigned int cpu)
+{
+   if (detector_type == X86_HARDLOCKUP_DETECTOR_PERF) {
+   hardlockup_detector_perf_disable();
+   return;
+   }
+
+   if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET) {
+   hardlockup_detector_hpet_disable(cpu);
+   return;
+   }
+}
+
+int __init watchdog_nmi_probe(void)
+{
+   int ret;
+
+   /*
+* Try first with the HPET hardlockup detector. It will only
+* succeed if selected at build time and the nmi_watchdog
+* command-line parameter is configured. This ensure that the
+* perf-based detector is used by default, if selected at
+* build time.
+*/
+   ret = hardlockup_detector_hpet_init();
+   if (!ret) {
+   detector_type = X86_HARDLOCKUP_DETECTOR_HPET;
+   return ret;
+   }
+
+   ret = hardlockup_detector_perf_init();
+   if (!ret) {
+   detector_type = X86_HARDLOCKUP_DETECTOR_PERF;
+   return ret;
+   }
+
+   return ret;
+}
+
+void watchdog_nmi_stop(void)
+{
+   /* Only the HPET lockup detector defines a stop function. */
+   if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET)
+   hardlockup_detector_hpet_stop();
+}
-- 
2.17.1



Re: [RFC PATCH v4 05/21] x86/hpet: Reserve timer for the HPET hardlockup detector

2019-06-13 Thread Ricardo Neri
On Tue, Jun 11, 2019 at 09:54:25PM +0200, Thomas Gleixner wrote:
> On Thu, 23 May 2019, Ricardo Neri wrote:
> 
> > HPET timer 2 will be used to drive the HPET-based hardlockup detector.
> > Reserve such timer to ensure it cannot be used by user space programs or
> > for clock events.
> > 
> > When looking for MSI-capable timers for clock events, skip timer 2 if
> > the HPET hardlockup detector is selected.
> 
> Why? Both the changelog and the code change lack an explanation why this
> timer is actually touched after it got reserved for the platform. The
> reservation should make it inaccessible for other things.

hpet_reserve_platform_timers() will give the HPET char driver a data
structure which specifies which drivers are reserved. In this manner,
they cannot be used by applications via file opens. The timer used by
the hardlockup detector should be marked as reserved.

Also, hpet_msi_capability_lookup() populates another data structure
which is used when obtaining an unused timer for a HPET clock event.
The timer used by the hardlockup detector should not be included in such
data structure.

Is this the explanation you would like to see? If yes, I will include it
in the changelog.

Thanks and BR,
Ricardo



Re: [RFC PATCH v4 20/21] iommu/vt-d: hpet: Reserve an interrupt remampping table entry for watchdog

2019-10-17 Thread Ricardo Neri
On Tue, Jun 18, 2019 at 01:08:06AM +0200, Thomas Gleixner wrote:
> Stephane,
> 
> On Mon, 17 Jun 2019, Stephane Eranian wrote:
> > On Mon, Jun 17, 2019 at 1:25 AM Thomas Gleixner  wrote:
> > > Great that there is no trace of any mail from Andi or Stephane about this
> > > on LKML. There is no problem with talking offlist about this stuff, but
> > > then you should at least provide a rationale for those who were not part 
> > > of
> > > the private conversation.
> > >
> > Let me add some context to this whole patch series. The pressure on the
> > core PMU counters is increasing as more people want to use them to
> > measure always more events. When the PMU is overcommitted, i.e., more
> > events than counters for them, there is multiplexing. It comes with an
> > overhead that is too high for certain applications. One way to avoid this
> > is to lower the multiplexing frequency, which is by default 1ms, but that
> > comes with loss of accuracy. Another approach is to measure only a small
> > number of events at a time and use multiple runs, but then you lose
> > consistent event view. Another approach is to push for increasing the
> > number of counters. But getting new hardware counters takes time. Short
> > term, we can investigate what it would take to free one cycle-capable
> > counter which is commandeered by the hard lockup detector on all X86
> > processors today. The functionality of the watchdog, being able to get a
> > crash dump on kernel deadlocks, is important and we cannot simply disable
> > it. At scale, many bugs are exposed and thus machines
> > deadlock. Therefore, we want to investigate what it would take to move
> > the detector to another NMI-capable source, such as the HPET because the
> > detector does not need high low granularity timer and interrupts only
> > every 2s.
> 
> I'm well aware about the reasons for this.
> 
> > Furthermore, recent Intel erratum, e.g., the TSX issue forcing the TFA
> > code in perf_events, have increased the pressure even more with only 3
> > generic counters left. Thus, it is time to look at alternative ways of
> > getting a hard lockup detector (NMI watchdog) from another NMI source
> > than the PMU. To that extent, I have been discussing about alternatives.
> >
> > Intel suggested using the HPET and Ricardo has been working on
> > producing this patch series. It is clear from your review
> > that the patches have issues, but I am hoping that they can be
> > resolved with constructive feedback knowing what the end goal is.
> 
> Well, I gave constructive feedback from the very first version on. But
> essential parts of that feedback have been ignored for whatever reasons.
> 
> > As for the round-robin changes, yes, we discussed this as an alternative
> > to avoid overloading CPU0 with handling all of the work to broadcasting
> > IPI to 100+ other CPUs.
> 
> I can understand the reason why you don't want to do that, but again, I
> said way before this was tried that changing affinity from NMI context with
> the IOMMU cannot work by just calling into the iommu code and it needs some
> deep investigation with the IOMMU wizards whether a preallocated entry can
> be used lockless (including the subsequently required flush).
> 
> The outcome is that the change was implemented by simply calling into
> functions which I told that they cannot be called from NMI context.
> 
> Unless this problem is not solved and I doubt it can be solved after
> talking to IOMMU people and studying manuals, the round robin mechanics in
> the current form are not going to happen. We'd need a SMI based lockup
> detector to debug the resulting livelock wreckage.
> 
> There are two possible options:
> 
>   1) Back to the IPI approach
> 
>  The probem with broadcast is that it sends IPIs one by one to each
>  online CPU, which sums up with a large number of CPUs.
> 
>  The interesting question is why the kernel does not utilize the all
>  excluding self destination shorthand for this. The SDM is not giving
>  any information.
> 
>  But there is a historic commit which is related and gives a hint:
> 
> commit e77deacb7b078156fcadf27b838a4ce1a65eda04
> Author: Keith Owens 
> Date:   Mon Jun 26 13:59:56 2006 +0200
> 
> [PATCH] x86_64: Avoid broadcasting NMI IPIs
> 
> On some i386/x86_64 systems, sending an NMI IPI as a broadcast will
>   reset the system.  This seems to be a BIOS bug which affects
>   machines where one or more cpus are not under OS control.  It
>   occurs on HT systems with a version of the OS that is not compiled
>   without HT support.  It also occurs when a system is booted with
>   max_cpus=n where 2 <= n < cpus known to the BIOS.  The fix is to
>   always send NMI IPI as a mask instead of as a broadcast.
> 
> I can see the issue with max_cpus and that'd be trivial to solve by
> disabling the HPET watchdog when maxcpus < num_present_cpus is on the
> command line 

Re: [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt

2021-05-13 Thread Ricardo Neri
On Wed, May 05, 2021 at 01:03:18AM +0200, Thomas Gleixner wrote:
> On Tue, May 04 2021 at 12:10, Ricardo Neri wrote:

Thank you very much for your feedback, Thomas. I am sorry it took me a
while to reply to your email. I needed to digest and research your
comments.

> > In x86 there is not an IRQF_NMI flag that can be used to indicate the
> 
> There exists no IRQF_NMI flag at all. No architecture provides that.

Thank you for the clarification. I think I meant to say that there is a
request_nmi() function but AFAIK it is only used in the ARM PMU and
would not work on x86.

> 
> > delivery mode when requesting an interrupt (via request_irq()). Thus,
> > there is no way for the interrupt remapping driver to know and set
> > the delivery mode.
> 
> There is no support for this today. So what?

Using request_irq() plus a HPET quirk looked to me a reasonable
way to use the irqdomain hierarchy to allocate an interrupt with NMI as
the delivery mode.

> 
> > Hence, when allocating an interrupt, check if such interrupt belongs to
> > the HPET hardlockup detector and fixup the delivery mode accordingly.
> 
> What?
> 
> > +   /*
> > +* If we find the HPET hardlockup detector irq, fixup the
> > +* delivery mode.
> > +*/
> > +   if (is_hpet_irq_hardlockup_detector(info))
> > +   irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;
> 
> Again. We are not sticking some random device checks into that
> code. It's wrong and I explained it to you before.
> 
>   
> https://lore.kernel.org/lkml/alpine.deb.2.21.1906161042080.1...@nanos.tec.linutronix.de/
> 
> But I'm happy to repeat it again:
> 
>   "No. This is horrible hackery violating all the layering which we carefully
>put into place to avoid exactly this kind of sprinkling conditionals into
>all code pathes.
> 
>With some thought the existing irqdomain hierarchy can be used to achieve
>the same thing without tons of extra functions and conditionals."
> 
> So the outcome of thought and using the irqdomain hierarchy is:
> 
>Replacing an hpet specific conditional in one place with an hpet
>specific conditional in a different place.
> 
> Impressive.

I am sorry Thomas, I did try to make the quirk less hacky but I did not
think of the solution you provide below.

> 
> hpet_assign_irq(, bool nmi)
>   init_info(info)
> ...
> if (nmi)
> info.flags |= X86_IRQ_ALLOC_AS_NMI;
>   
>irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, )
>  intel_irq_remapping_alloc(..., info)
>irq_domain_alloc_irq_parents(..., info)
>  x86_vector_alloc_irqs(..., info)
>  {   
>if (info->flags & X86_IRQ_ALLOC_AS_NMI && nr_irqs != 1)
>   return -EINVAL;
> 
>for (i = 0; i < nr_irqs; i++) {
>  
>  if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
>  irq_cfg_setup_nmi(apicd);
>  continue;
>  }
>  ...
>  }
> 
> irq_cfg_setup_nmi() sets irq_cfg->delivery_mode and whatever is required
> and everything else just works. Of course this needs a few other minor
> tweaks but none of those introduces random hpet quirks all over the
> place. Not convoluted enough, right?

Thanks for the detailed demonstration! It does seem cleaner than what I
implemented.

> 
> But that solves none of other problems. Let me summarize again which
> options or non-options we have:
> 
> 1) Selective IPIs from NMI context cannot work
> 
>As explained in the other thread.
> 
> 2) Shorthand IPI allbutself from NMI
> 
>This should work, but that obviously does not take the watchdog
>cpumask into account.
> 
>Also this only works when IPI shorthand mode is enabled. See
>apic_smt_update() for details.
> 
> 3) Sending the IPIs from irq_work
> 
>This would solve the problem, but if the CPU which is the NMI
>target is really stuck in an interrupt disabled region then the
>IPIs won't be sent.
> 
>OTOH, if that's the case then the CPU which was processing the
>NMI will continue to be stuck until the next NMI hits which
>will detect that the CPU is stuck which is a good enough
>reason to send a shorthand IPI to all CPUs ignoring the
>watchdog cpumask.
> 
>Same limitation vs. shorthand mode as #2
> 
> 4) Changing affinity of the HPET NMI from NMI
> 
>As we established two years ago that cannot work with interrupt
>remapping
> 
> 5) Changing affinity 

[RFC PATCH v5 7/7] x86/watchdog/hardlockup/hpet: Support interrupt remapping

2021-05-04 Thread Ricardo Neri
When interrupt remapping is enabled in the system, the MSI interrupt
address and data fields must follow a special format that the IOMMU
defines.

However, the HPET hardlockup detector must rely on the interrupt
subsystem to have the interrupt remapping drivers allocate, activate,
and set the affinity of HPET timer interrupt. Hence, it must use
request_irq() to use such functionality.

In x86 there is not an IRQF_NMI flag to indicate to the interrupt
subsystem the delivery mode of the interrupt. A previous changset added
functionality to detect the interrupt of the HPET hardlockup detector
and fixup the delivery mode accordingly.

Also, since request_irq() is used, a non-NMI interrupt handler must be
defined. Even if it is not needed.

When Interrupt Remapping is enabled, use the new facility to ensure
interrupt is plumbed properly to work with interrupt remapping.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * Use request_irq() to obtain an IRTE for the HPET hardlockup detector
   instead of the custom interfaces previously implemented in the
   interrupt remapping drivers.
 * Simplified detection of interrupt remapping by checking the parent
   of the HPET irq domain.
 * Stopped using the HPET magic fields of struct irq_alloc_info. They
   were removed in commit 2bf1e7bcedb8 ("x86/msi: Consolidate HPET
   allocation")
 * Rephrased commit message for clarity. (Ashok)
 * Clarified error message of non-NMI handler. (Ashok)

Changes since v3:
 * None

Changes since v2:
 * None

Changes since v1:
 * Introduced this patch. Added custom functions in the Intel IOMMU driver
   to allocate an IRTE for the HPET hardlockup detector.
---
 arch/x86/include/asm/hpet.h |  2 ++
 arch/x86/kernel/hpet.c  |  3 ++
 arch/x86/kernel/watchdog_hld_hpet.c | 48 +
 3 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 5bf675970d4b..d130285ddc96 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -109,6 +109,7 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler 
handler);
  * @tsc_ticks_per_group:   TSC ticks that must elapse for each group of
  * monitored CPUs.
  * @irq:   IRQ number assigned to the HPET channel
+ * @int_remap_enabled: True if interrupt remapping is enabled
  * @handling_cpu:  CPU handling the HPET interrupt
  * @pkgs_per_group:Number of physical packages in a group of CPUs
  * receiving an IPI
@@ -133,6 +134,7 @@ struct hpet_hld_data {
u64 tsc_next;
u64 tsc_ticks_per_group;
int irq;
+   boolintr_remap_enabled;
u32 handling_cpu;
u32 pkgs_per_group;
u32 nr_groups;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3e43e0f348b8..ff4abdef5e15 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1464,6 +1464,9 @@ struct hpet_hld_data *hpet_hld_get_timer(void)
if (!hpet_domain)
goto err;
 
+   if (hpet_domain->parent != x86_vector_domain)
+   hld_data->intr_remap_enabled = true;
+
hc->mode = HPET_MODE_NMI_WATCHDOG;
irq = hpet_assign_irq(hpet_domain, hc, hc->num);
if (irq <= 0)
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index 3fd2405b31fa..265641d001ac 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -176,6 +176,14 @@ static int update_msi_destid(struct hpet_hld_data *hdata)
 {
u32 destid;
 
+   if (hdata->intr_remap_enabled) {
+   int ret;
+
+   ret = irq_set_affinity(hdata->irq,
+  cpumask_of(hdata->handling_cpu));
+   return ret;
+   }
+
destid = apic->calc_dest_apicid(hdata->handling_cpu);
/*
 * HPET only supports a 32-bit MSI address register. Thus, only
@@ -393,26 +401,52 @@ static int hardlockup_detector_nmi_handler(unsigned int 
type,
return NMI_DONE;
 }
 
+/*
+ * When interrupt remapping is enabled, we request the irq for the detector
+ * using request_irq() and then we fixup the delivery mode to NMI using
+ * is_hpet_irq_hardlockup_detector(). If the latter fails, we will see a non-
+ * NMI interrupt.
+ *
+ */
+static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
+{
+   pr_err_once("Received a n

[RFC PATCH v5 1/7] x86/apic: Add irq_cfg::delivery_mode

2021-05-04 Thread Ricardo Neri
Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. This new member can
be used to update the configuration of the delivery mode in each interrupt
domain.

Currently, all interrupt domains set the delivery mode of interrupts using
the APIC setting. Interrupt domains use an irq_cfg data structure to
configure their own data structures and hardware resources. Thus, in order
to keep the current behavior, set the delivery mode of the irq
configuration that as the APIC setting. In this manner, irq domains can
obtain the delivery mode from the irq configuration data instead of the
APIC setting, if needed.

Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x86@kernel.orgReviewed-by: Ashok Raj 
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * Rebased to use new enumeration apic_delivery_modes.

Changes since v3:
 * None

Changes since v2:
 * Reduced scope to only add the interrupt delivery mode in
   struct irq_alloc_info.

Changes since v1:
 * Introduced this patch.
---
 arch/x86/include/asm/hw_irq.h |  1 +
 arch/x86/kernel/apic/vector.c | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index d465ece58151..370f4db0372b 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -90,6 +90,7 @@ struct irq_alloc_info {
 struct irq_cfg {
unsigned intdest_apicid;
unsigned intvector;
+   enum apic_delivery_modesdelivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6dbdc7c22bb7..d47ed07a56a4 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -567,6 +567,16 @@ static int x86_vector_alloc_irqs(struct irq_domain 
*domain, unsigned int virq,
irqd->chip_data = apicd;
irqd->hwirq = virq + i;
irqd_set_single_target(irqd);
+
+   /*
+* Initialize the delivery mode of this irq to match the
+* default delivery mode of the APIC. This is useful for
+* children irq domains which want to take the delivery
+* mode from the individual irq configuration rather
+* than from the APIC.
+*/
+apicd->hw_irq_cfg.delivery_mode = apic->delivery_mode;
+
/*
 * Prevent that any of these interrupts is invoked in
 * non interrupt context via e.g. generic_handle_irq()
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 6/7] iommu/amd: Fixup delivery mode of the HPET hardlockup interrupt

2021-05-04 Thread Ricardo Neri
The HPET hardlockup detector requires that the HPET timer delivers the
interrupt as NMI. When interrupt remapping is disabled, this can be
done by programming the HPET MSI registers directly. With interrupt
remapping, it is necessary to populate an entry in the interrupt
remapping table.

In x86 there is not an IRQF_NMI flag that can be used to indicate the
delivery mode when requesting an interrupt (via request_irq()). Thus,
there is no way for the interrupt remapping driver to know and set
the delivery mode.

Hence, when allocating an interrupt, check if such interrupt belongs to
the HPET hardlockup detector and fixup the delivery mode accordingly.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * Introduced this patch.

Changes since v3:
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 drivers/iommu/amd/iommu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e8d9fae0c766..758e08ba42e6 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3254,6 +3255,14 @@ static int irq_remapping_alloc(struct irq_domain 
*domain, unsigned int virq,
irq_data->hwirq = (devid << 16) + i;
irq_data->chip_data = data;
irq_data->chip = _ir_chip;
+
+   /*
+* If we find the HPET hardlockup detector irq, fixup the
+* delivery mode.
+*/
+   if (is_hpet_irq_hardlockup_detector(info))
+   cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;
+
irq_remapping_prepare_irte(data, cfg, info, devid, index, i);
irq_set_status_flags(virq + i, IRQ_MOVE_PCNTXT);
}
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 2/7] x86/hpet: Introduce function to identify HPET hardlockup detector irq

2021-05-04 Thread Ricardo Neri
The HPET hardlockup detector needs to deliver its interrupt as NMI.
In x86 there is not an IRQF_NMI flag that can be used in the irq plumbing
code to tell interrupt remapping drivers to set the interrupt delivery
mode accordingly. Hence, they must fixup the delivery mode internally.

Implement a method to determine if the interrupt being allocated belongs
to the HPET hardlockup detector.

Cc: Andi Kleen 
Cc: Borislav Petkov 
Cc: David Woodhouse  (supporter:INTEL IOMMU (VT-d))
Cc: "Ravi V. Shankar" 
Cc: Ingo Molnar 
Cc: Jacob Pan 
Cc: Lu Baolu  (supporter:INTEL IOMMU (VT-d))
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d))
Cc: x...@kernel.org
Reviewed-by: Ashok Raj 
Signed-off-by: Ricardo Neri 
---
Changes since v4:
 * Introduced this patch. Previous versions had special functions to
   allocate and set the affinity of a remapped NMI interrupt.

Changes since v3:
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 arch/x86/include/asm/hpet.h |  3 +++
 arch/x86/kernel/hpet.c  | 33 +
 2 files changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index df11c7d4af44..5bf675970d4b 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -149,6 +149,7 @@ extern void hardlockup_detector_hpet_stop(void);
 extern void hardlockup_detector_hpet_enable(unsigned int cpu);
 extern void hardlockup_detector_hpet_disable(unsigned int cpu);
 extern void hardlockup_detector_switch_to_perf(void);
+extern bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info);
 #else
 static inline int hardlockup_detector_hpet_init(void)
 { return -ENODEV; }
@@ -156,6 +157,8 @@ static inline void hardlockup_detector_hpet_stop(void) {}
 static inline void hardlockup_detector_hpet_enable(unsigned int cpu) {}
 static inline void hardlockup_detector_hpet_disable(unsigned int cpu) {}
 static inline void hardlockup_detector_switch_to_perf(void) {}
+static inline bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info)
+{ return false; }
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #else /* CONFIG_HPET_TIMER */
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 5012590dc1b8..3e43e0f348b8 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1479,6 +1479,39 @@ struct hpet_hld_data *hpet_hld_get_timer(void)
hld_data = NULL;
return NULL;
 }
+
+/**
+ * is_hpet_irq_hardlockup_detector() - Identify the HPET hld interrupt info
+ * @info:  Interrupt allocation info, with private HPET channel data
+ *
+ * The HPET hardlockup detector is special as it needs its interrupts delivered
+ * as NMI. However, for interrupt remapping we use the existing irq subsystem
+ * to configure and route the HPET interrupt. Unfortunately, there is not a
+ * IRQF_NMI flag for x86. Instead, identify whether the interrupt being
+ * allocated for the HPET channel belongs to the hardlockup detector.
+ *
+ * Returns: True if @info indicates that it belongs to the HPET hardlockup
+ * detector. False otherwise.
+ */
+bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info)
+{
+   struct hpet_channel *hc;
+
+   if (!info)
+   return false;
+
+   if (info->type != X86_IRQ_ALLOC_TYPE_HPET)
+   return false;
+
+   hc = info->data;
+   if (!hc)
+   return false;
+
+   if (hc->mode == HPET_MODE_NMI_WATCHDOG)
+   return true;
+
+   return false;
+}
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #endif
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   >