Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter

2018-06-12 Thread Randy Dunlap
On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index f2040d4..a8833c7 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2577,7 +2577,7 @@
>   Format: [state][,regs][,debounce][,die]
>  
>   nmi_watchdog=   [KNL,BUGS=X86] Debugging features for SMP kernels
> - Format: [panic,][nopanic,][num]
> + Format: [panic,][nopanic,][num,][hpet]
>   Valid num: 0 or 1
>   0 - turn hardlockup detector in nmi_watchdog off
>   1 - turn hardlockup detector in nmi_watchdog on

This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
Is that correct?

> @@ -2587,6 +2587,9 @@
>   please see 'nowatchdog'.
>   This is useful when you use a panic=... timeout and
>   need the box quickly up again.
> + When hpet is specified, the NMI watchdog will be driven
> + by an HPET timer, if available in the system. Otherwise,
> + the perf-based implementation will be used.
>  
>   These settings can be accessed at runtime via
>   the nmi_watchdog and hardlockup_panic sysctls.


thanks,
-- 
~Randy
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector

2018-06-12 Thread Randy Dunlap
Hi,

On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c40c7b7..6e79833 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
>   bool
>   select SOFTLOCKUP_DETECTOR
>  
> +config HARDLOCKUP_DETECTOR_HPET
> + bool "Use HPET Timer for Hard Lockup Detection"
> + select SOFTLOCKUP_DETECTOR
> + select HARDLOCKUP_DETECTOR
> + depends on HPET_TIMER && HPET
> + help
> +   Say y to enable a hardlockup detector that is driven by an 
> High-Precision
> +   Event Timer. In addition to selecting this option, the command-line
> +   parameter nmi_watchdog option. See 
> Documentation/admin-guide/kernel-parameters.rst

The "In addition ..." thing is a broken (incomplete) sentence.

> +
>  #
>  # Enables a timestamp based low pass filter to compensate for perf based
>  # hard lockup detection which runs too fast due to turbo modes.


-- 
~Randy
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs

2018-06-12 Thread Ricardo Neri
In order to detect hardlockups in all the monitored CPUs, move the
interrupt to the next monitored CPU when handling the NMI interrupt; wrap
around when reaching the highest CPU in the mask. This rotation is achieved
by setting the affinity mask to only contain the next CPU to monitor.

In order to prevent our interrupt to be reassigned to another CPU, flag
it as IRQF_NONBALANCING.

The cpumask monitored_mask keeps track of the CPUs that the watchdog
should monitor. This structure is updated when the NMI watchdog is
enabled or disabled in a specific CPU. As this mask can change
concurrently as CPUs are put online or offline and the watchdog is
disabled or enabled, a lock is required to protect the monitored_mask.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 kernel/watchdog_hld_hpet.c | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 857e051..c40acfd 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #undef pr_fmt
@@ -199,8 +200,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, 
void *data)
  * @regs:  Register values as seen when the NMI was asserted
  *
  * When an NMI is issued, look for hardlockups. If the timer is not periodic,
- * kick it. The interrupt is always handled when if delivered via the
- * Front-Side Bus.
+ * kick it. Move the interrupt to the next monitored CPU. The interrupt is
+ * always handled when if delivered via the Front-Side Bus.
  *
  * Returns:
  *
@@ -211,7 +212,7 @@ static int hardlockup_detector_nmi_handler(unsigned int val,
   struct pt_regs *regs)
 {
struct hpet_hld_data *hdata = hld_data;
-   unsigned int use_fsb;
+   unsigned int use_fsb, cpu;
 
/*
 * If FSB delivery mode is used, the timer interrupt is programmed as
@@ -222,8 +223,27 @@ static int hardlockup_detector_nmi_handler(unsigned int 
val,
if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
return NMI_DONE;
 
+   /* There are no CPUs to monitor. */
+   if (!cpumask_weight(>monitored_mask))
+   return NMI_HANDLED;
+
inspect_for_hardlockups(regs);
 
+   /*
+* Target a new CPU. Keep trying until we find a monitored CPU. CPUs
+* are addded and removed to this mask at cpu_up() and cpu_down(),
+* respectively. Thus, the interrupt should be able to be moved to
+* the next monitored CPU.
+*/
+   spin_lock(_data->lock);
+   for_each_cpu_wrap(cpu, >monitored_mask, smp_processor_id() + 1) {
+   if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
+   break;
+   pr_err("Could not assign interrupt to CPU %d. Trying with next 
present CPU.\n",
+  cpu);
+   }
+   spin_unlock(_data->lock);
+
if (!(hdata->flags & HPET_DEV_PERI_CAP))
kick_timer(hdata);
 
@@ -336,7 +356,7 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 * Request an interrupt to activate the irq in all the needed domains.
 */
ret = request_irq(hwirq, hardlockup_detector_irq_handler,
- IRQF_TIMER | IRQF_DELIVER_AS_NMI,
+ IRQF_TIMER | IRQF_DELIVER_AS_NMI | IRQF_NOBALANCING,
  "hpet_hld", hdata);
if (ret)
unregister_nmi_handler(NMI_LOCAL, "hpet_hld");
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 23/23] watchdog/hardlockup: Activate the HPET-based lockup detector

2018-06-12 Thread Ricardo Neri
Now that the implementation of the HPET-based hardlockup detector is
complete, enable it. It will be used only if it can be initialized
successfully. Otherwise, the perf-based detector will be used.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 kernel/watchdog.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b5ce6e4..e2cc6c0 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -149,6 +149,21 @@ int __weak __init watchdog_nmi_probe(void)
 {
int ret = -ENODEV;
 
+   /*
+* Try first with the HPET hardlockup detector. It will only
+* succeed if selected at build time and the nmi_watchdog
+* command-line parameter is configured. This ensure that the
+* perf-based detector is used by default, if selected at
+* build time.
+*/
+   if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET))
+   ret = hardlockup_detector_hpet_ops.init();
+
+   if (!ret) {
+   nmi_wd_ops = _detector_hpet_ops;
+   return ret;
+   }
+
if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_PERF))
ret = hardlockup_detector_perf_ops.init();
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter

2018-06-12 Thread Ricardo Neri
Keep the HPET-based hardlockup detector disabled unless explicitly enabled
via a command line argument. If such parameter is not given, the hardlockup
detector will fallback to use the perf-based implementation.

The function hardlockup_panic_setup() is updated to return 0 in order to
to allow __setup functions of specific hardlockup detectors (in this case
hardlockup_detector_hpet_setup()) to inspect the nmi_watchdog boot
parameter.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
--
checkpatch gives the following warning:

CHECK: __setup appears un-documented -- check 
Documentation/admin-guide/kernel-parameters.rst
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);

This is a false-positive as the option nmi_watchdog is already
documented. The option is re-evaluated in this file as well.
---
 Documentation/admin-guide/kernel-parameters.txt |  5 -
 kernel/watchdog.c   |  2 +-
 kernel/watchdog_hld_hpet.c  | 13 +
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index f2040d4..a8833c7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2577,7 +2577,7 @@
Format: [state][,regs][,debounce][,die]
 
nmi_watchdog=   [KNL,BUGS=X86] Debugging features for SMP kernels
-   Format: [panic,][nopanic,][num]
+   Format: [panic,][nopanic,][num,][hpet]
Valid num: 0 or 1
0 - turn hardlockup detector in nmi_watchdog off
1 - turn hardlockup detector in nmi_watchdog on
@@ -2587,6 +2587,9 @@
please see 'nowatchdog'.
This is useful when you use a panic=... timeout and
need the box quickly up again.
+   When hpet is specified, the NMI watchdog will be driven
+   by an HPET timer, if available in the system. Otherwise,
+   the perf-based implementation will be used.
 
These settings can be accessed at runtime via
the nmi_watchdog and hardlockup_panic sysctls.
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b94bbe3..b5ce6e4 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -84,7 +84,7 @@ static int __init hardlockup_panic_setup(char *str)
nmi_watchdog_user_enabled = 0;
else if (!strncmp(str, "1", 1))
nmi_watchdog_user_enabled = 1;
-   return 1;
+   return 0;
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
 
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index ebb820d..12e5937 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -17,6 +17,7 @@
 #define pr_fmt(fmt) "NMI hpet watchdog: " fmt
 
 static struct hpet_hld_data *hld_data;
+static bool hardlockup_use_hpet;
 
 /**
  * get_count() - Get the current count of the HPET timer
@@ -488,6 +489,15 @@ static void hardlockup_detector_hpet_stop(void)
spin_unlock(_data->lock);
 }
 
+static int __init hardlockup_detector_hpet_setup(char *str)
+{
+   if (strstr(str, "hpet"))
+   hardlockup_use_hpet = true;
+
+   return 0;
+}
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);
+
 /**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
@@ -502,6 +512,9 @@ static int __init hardlockup_detector_hpet_init(void)
 {
int ret;
 
+   if (!hardlockup_use_hpet)
+   return -EINVAL;
+
if (!is_hpet_enabled())
return -ENODEV;
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 21/23] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs

2018-06-12 Thread Ricardo Neri
Each CPU should be monitored for hardlockups every watchdog_thresh seconds.
Since all the CPUs in the system are monitored by the same timer and the
timer interrupt is rotated among the monitored CPUs, the timer must expire
every watchdog_thresh/N seconds; where N is the number of monitored CPUs.

A new member is added to struct hpet_wdt_data to determine the per-CPU
ticks per second. This quantity is used to program the comparator of the
timer.

The ticks-per-CPU quantity is updated every time when the number of
monitored CPUs changes: when the watchdog is enabled or disabled for
a specific CPU.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  1 +
 kernel/watchdog_hld_hpet.c  | 41 -
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6ace2d1..e67818d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -124,6 +124,7 @@ struct hpet_hld_data {
u32 irq;
u32 flags;
u64 ticks_per_second;
+   u64 ticks_per_cpu;
struct cpumask  monitored_mask;
spinlock_t  lock; /* serialized access to monitored_mask */
 };
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index c40acfd..ebb820d 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -65,11 +65,21 @@ static void kick_timer(struct hpet_hld_data *hdata)
 * are able to update the comparator before the counter reaches such new
 * value.
 *
+* The timer must monitor each CPU every watch_thresh seconds. Hence the
+* timer expiration must be:
+*
+*watch_thresh/N
+*
+* where N is the number of monitored CPUs.
+*
+* in order to monitor all the online CPUs. ticks_per_cpu gives the
+* number of ticks needed to meet the condition above.
+*
 * Let it wrap around if needed.
 */
count = get_count();
 
-   new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+   new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
 
set_comparator(hdata, new_compare);
 }
@@ -160,6 +170,33 @@ static bool is_hpet_wdt_interrupt(struct hpet_hld_data 
*hdata)
 }
 
 /**
+ * update_ticks_per_cpu() - Update the number of HPET ticks per CPU
+ * @hdata: struct with the timer's the ticks-per-second and CPU mask
+ *
+ * From the overall ticks-per-second of the timer, compute the number of ticks
+ * after which the timer should expire to monitor each CPU every watch_thresh
+ * seconds. The ticks-per-cpu quantity is computed using the number of CPUs 
that
+ * the watchdog currently monitors.
+ *
+ * Returns:
+ *
+ * None
+ *
+ */
+static void update_ticks_per_cpu(struct hpet_hld_data *hdata)
+{
+   unsigned int num_cpus = cpumask_weight(>monitored_mask);
+   unsigned long long temp = hdata->ticks_per_second;
+
+   /* Only update if there are monitored CPUs. */
+   if (!num_cpus)
+   return;
+
+   do_div(temp, num_cpus);
+   hdata->ticks_per_cpu = temp;
+}
+
+/**
  * hardlockup_detector_irq_handler() - Interrupt handler
  * @irq:   Interrupt number
  * @data:  Data associated with the interrupt
@@ -390,6 +427,7 @@ static void hardlockup_detector_hpet_enable(void)
spin_lock(_data->lock);
 
cpumask_set_cpu(cpu, _data->monitored_mask);
+   update_ticks_per_cpu(hld_data);
 
/*
 * If this is the first CPU to be monitored, set everything in motion:
@@ -425,6 +463,7 @@ static void hardlockup_detector_hpet_disable(void)
spin_lock(_data->lock);
 
cpumask_clear_cpu(smp_processor_id(), _data->monitored_mask);
+   update_ticks_per_cpu(hld_data);
 
/* Only disable the timer if there are no more CPUs to monitor. */
if (!cpumask_weight(_data->monitored_mask))
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI

2018-06-12 Thread Ricardo Neri
In order to detect hardlockups, it is necessary to have the ability to
receive interrupts even when disabled: a non-maskable interrupt is
required. Add the flag IRQF_DELIVER_AS_NMI to the arguments of
request_irq() for this purpose.

Note that the timer, when programmed to deliver interrupts via the IO APIC
is programmed as level-triggered. This is to have an indication that the
NMI comes from HPET timer as indicated in the General Status Interrupt
Register. However, NMIs are always edge-triggered, thus a GSI edge-
triggered interrupt is now requested.

An NMI handler is also implemented. The handler looks for hardlockups and
kicks the timer.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/hpet.c |  2 +-
 kernel/watchdog_hld_hpet.c | 55 +-
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index fda6e19..5ca1953 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -205,7 +205,7 @@ int hpet_hardlockup_detector_assign_legacy_irq(struct 
hpet_hld_data *hdata)
break;
}
 
-   gsi = acpi_register_gsi(NULL, hwirq, ACPI_LEVEL_SENSITIVE,
+   gsi = acpi_register_gsi(NULL, hwirq, ACPI_EDGE_SENSITIVE,
ACPI_ACTIVE_LOW);
if (gsi > 0)
break;
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 8fa4e55..3bedffa 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #undef pr_fmt
 #define pr_fmt(fmt) "NMI hpet watchdog: " fmt
@@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, 
void *data)
if (!(hdata->flags & HPET_DEV_PERI_CAP))
kick_timer(hdata);
 
+   pr_err("This interrupt should not have happened. Ensure delivery mode 
is NMI.\n");
+
/* Acknowledge interrupt if in level-triggered mode */
if (!use_fsb)
hpet_writel(BIT(hdata->num), HPET_STATUS);
@@ -191,6 +194,47 @@ static irqreturn_t hardlockup_detector_irq_handler(int 
irq, void *data)
 }
 
 /**
+ * hardlockup_detector_nmi_handler() - NMI Interrupt handler
+ * @val:   Attribute associated with the NMI. Not used.
+ * @regs:  Register values as seen when the NMI was asserted
+ *
+ * When an NMI is issued, look for hardlockups. If the timer is not periodic,
+ * kick it. The interrupt is always handled when if delivered via the
+ * Front-Side Bus.
+ *
+ * Returns:
+ *
+ * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
+ * otherwise.
+ */
+static int hardlockup_detector_nmi_handler(unsigned int val,
+  struct pt_regs *regs)
+{
+   struct hpet_hld_data *hdata = hld_data;
+   unsigned int use_fsb;
+
+   /*
+* If FSB delivery mode is used, the timer interrupt is programmed as
+* edge-triggered and there is no need to check the ISR register.
+*/
+   use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
+
+   if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
+   return NMI_DONE;
+
+   inspect_for_hardlockups(regs);
+
+   if (!(hdata->flags & HPET_DEV_PERI_CAP))
+   kick_timer(hdata);
+
+   /* Acknowledge interrupt if in level-triggered mode */
+   if (!use_fsb)
+   hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+   return NMI_HANDLED;
+}
+
+/**
  * setup_irq_msi_mode() - Configure the timer to deliver an MSI interrupt
  * @data:  Data associated with the instance of the HPET timer to configure
  *
@@ -282,11 +326,20 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
if (ret)
return ret;
 
+   /* Register the NMI handler, which will be the actual handler we use. */
+   ret = register_nmi_handler(NMI_LOCAL, hardlockup_detector_nmi_handler,
+  0, "hpet_hld");
+   if (ret)
+   return ret;
+
/*
 * Request an interrupt to activate the irq in all the needed domains.
 */
ret = request_irq(hwirq, hardlockup_detector_irq_handler,
- IRQF_TIMER, "hpet_hld", 

[RFC PATCH 19/23] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation

2018-06-12 Thread Ricardo Neri
CPU architectures that have an NMI watchdog use arch_touch_nmi_watchdog()
to briefly ignore the hardlockup detector. If the architecture does not
have an NMI watchdog, one can be constructed using a source of non-
maskable interrupts. In this case, arch_touch_nmi_watchdog() is common
to any underlying hardware resource used to drive the detector and needs
to be available to other kernel subsystems if hardware different from perf
drives the detector.

There exists perf-based and HPET-based implementations. Make it available
to the latter.

For clarity, wrap this function in a separate preprocessor conditional
from functions which are truly specific to the perf-based implementation.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 23e20d2..8b6b814 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -89,16 +89,22 @@ static inline void hardlockup_detector_disable(void) {}
 # define NMI_WATCHDOG_SYSCTL_PERM  0444
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) || \
+defined(CONFIG_HARDLOCKUP_DETECTOR_HPET)
 extern void arch_touch_nmi_watchdog(void);
+#else
+# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
+static inline void arch_touch_nmi_watchdog(void) {}
+# endif
+#endif
+
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline void arch_touch_nmi_watchdog(void) {}
-# endif
+
 #endif
 
 /**
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 18/23] watchdog/hardlockup/hpet: Add the NMI watchdog operations

2018-06-12 Thread Ricardo Neri
Implement the start, stop and disable operations of the HPET-based NMI
watchdog. Given that a single timer is used to monitor all the CPUs in
the system, it is necessary to define a cpumask that keeps track of the
CPUs that can be monitored. This cpumask is protected with a spin lock.

As individual CPUs are put online and offline, this cpumask is updated.
CPUs are unconditionally cleared from the mask when going offline. When
going online, the CPU is set in the mask only if is one of the CPUs allowed
to be monitored by the watchdog.

It is not necessary to implement a start function. The NMI watchdog will
be enabled when there is at least one CPU to monitor.

The disable function clears the CPU mask and disables the timer.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  2 +
 include/linux/nmi.h |  1 +
 kernel/watchdog_hld_hpet.c  | 98 +
 3 files changed, 101 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 33309b7..6ace2d1 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -124,6 +124,8 @@ struct hpet_hld_data {
u32 irq;
u32 flags;
u64 ticks_per_second;
+   struct cpumask  monitored_mask;
+   spinlock_t  lock; /* serialized access to monitored_mask */
 };
 
 extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e608762..23e20d2 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -129,6 +129,7 @@ struct nmi_watchdog_ops {
 };
 
 extern struct nmi_watchdog_ops hardlockup_detector_perf_ops;
+extern struct nmi_watchdog_ops hardlockup_detector_hpet_ops;
 
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 3bedffa..857e051 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -345,6 +345,91 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 }
 
 /**
+ * hardlockup_detector_hpet_enable() - Enable the hardlockup detector
+ *
+ * The hardlockup detector is enabled for the CPU that executes the
+ * function. It is only enabled if such CPU is allowed to be monitored
+ * by the lockup detector.
+ *
+ * Returns:
+ *
+ * None
+ *
+ */
+static void hardlockup_detector_hpet_enable(void)
+{
+   struct cpumask *allowed = watchdog_get_allowed_cpumask();
+   unsigned int cpu = smp_processor_id();
+
+   if (!hld_data)
+   return;
+
+   if (!cpumask_test_cpu(cpu, allowed))
+   return;
+
+   spin_lock(_data->lock);
+
+   cpumask_set_cpu(cpu, _data->monitored_mask);
+
+   /*
+* If this is the first CPU to be monitored, set everything in motion:
+* move the interrupt to this CPU, kick and enable the timer.
+*/
+   if (cpumask_weight(_data->monitored_mask) == 1) {
+   if (irq_set_affinity(hld_data->irq, cpumask_of(cpu))) {
+   spin_unlock(_data->lock);
+   pr_err("Unable to enable on CPU %d.!\n", cpu);
+   return;
+   }
+
+   kick_timer(hld_data);
+   enable(hld_data);
+   }
+
+   spin_unlock(_data->lock);
+}
+
+/**
+ * hardlockup_detector_hpet_disable() - Disable the hardlockup detector
+ *
+ * The hardlockup detector is disabled for the CPU that executes the
+ * function.
+ *
+ * None
+ */
+static void hardlockup_detector_hpet_disable(void)
+{
+   if (!hld_data)
+   return;
+
+   spin_lock(_data->lock);
+
+   cpumask_clear_cpu(smp_processor_id(), _data->monitored_mask);
+
+   /* Only disable the timer if there are no more CPUs to monitor. */
+   if (!cpumask_weight(_data->monitored_mask))
+   disable(hld_data);
+
+   spin_unlock(_data->lock);
+}
+
+/**
+ * hardlockup_detector_hpet_stop() - Stop the NMI watchdog on all CPUs
+ *
+ * Returns:
+ *
+ * None
+ */
+static void hardlockup_detector_hpet_stop(void)
+{
+   disable(hld_data);
+
+   spin_lock(_data->lock);
+   cpumask_clear(_data->monitored_mask);
+   spin_unlock(_data->lock);
+}
+
+/**
  * hardlockup_detector_hpet_init() - 

[RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector

2018-06-12 Thread Ricardo Neri
This is the initial implementation of a hardlockup detector driven by an
HPET timer. This initial implementation includes functions to control
the timer via its registers. It also requests such timer, installs
a minimal interrupt handler and performs the initial configuration of
the timer.

The detector is not functional at this stage. Subsequent changesets will
populate the NMI watchdog operations and register it with the lockup
detector.

This detector depends on HPET_TIMER since platform code performs the
initialization of the timer and maps its registers to memory. It depends
on HPET to compute the ticks per second of the timer.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 kernel/Makefile|   1 +
 kernel/watchdog_hld_hpet.c | 334 +
 lib/Kconfig.debug  |  10 ++
 3 files changed, 345 insertions(+)
 create mode 100644 kernel/watchdog_hld_hpet.c

diff --git a/kernel/Makefile b/kernel/Makefile
index 0a0d86d..73c79b2 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
 obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o watchdog_hld_perf.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld.o watchdog_hld_hpet.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
new file mode 100644
index 000..8fa4e55
--- /dev/null
+++ b/kernel/watchdog_hld_hpet.c
@@ -0,0 +1,334 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A hardlockup detector driven by an HPET timer.
+ *
+ * Copyright (C) Intel Corporation 2018
+ */
+
+#define pr_fmt(fmt) "NMI hpet watchdog: " fmt
+
+#include 
+#include 
+#include 
+
+#undef pr_fmt
+#define pr_fmt(fmt) "NMI hpet watchdog: " fmt
+
+static struct hpet_hld_data *hld_data;
+
+/**
+ * get_count() - Get the current count of the HPET timer
+ *
+ * Returns:
+ *
+ * Value of the main counter of the HPET timer
+ */
+static inline unsigned long get_count(void)
+{
+   return hpet_readq(HPET_COUNTER);
+}
+
+/**
+ * set_comparator() - Update the comparator in an HPET timer instance
+ * @hdata: A data structure with the timer instance to update
+ * @cmp:   The value to write in the in the comparator registere
+ *
+ * Returns:
+ *
+ * None
+ */
+static inline void set_comparator(struct hpet_hld_data *hdata,
+ unsigned long cmp)
+{
+   hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
+}
+
+/**
+ * kick_timer() - Reprogram timer to expire in the future
+ * @hdata: A data structure with the timer instance to update
+ *
+ * Reprogram the timer to expire within watchdog_thresh seconds in the future.
+ *
+ * Returns:
+ *
+ * None
+ */
+static void kick_timer(struct hpet_hld_data *hdata)
+{
+   unsigned long new_compare, count;
+
+   /*
+* Update the comparator in increments of watch_thresh seconds relative
+* to the current count. Since watch_thresh is given in seconds, we
+* are able to update the comparator before the counter reaches such new
+* value.
+*
+* Let it wrap around if needed.
+*/
+   count = get_count();
+
+   new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+
+   set_comparator(hdata, new_compare);
+}
+
+/**
+ * disable() - Disable an HPET timer instance
+ * @hdata: A data structure with the timer instance to disable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void disable(struct hpet_hld_data *hdata)
+{
+   unsigned int v;
+
+   v = hpet_readl(HPET_Tn_CFG(hdata->num));
+   v &= ~HPET_TN_ENABLE;
+   hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * enable() - Enable an HPET timer instance
+ * @hdata: A data structure with the timer instance to enable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void enable(struct hpet_hld_data *hdata)
+{
+   unsigned long v;
+
+   /* Clear any previously active interrupt. */
+   hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+   v = hpet_readl(HPET_Tn_CFG(hdata->num));
+   v |= HPET_TN_ENABLE;
+   hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * set_periodic() - Set 

[RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf

2018-06-12 Thread Ricardo Neri
The current default implementation of the hardlockup detector assumes that
it is implemented using perf events. However, the hardlockup detector can
be driven by other sources of non-maskable interrupts (e.g., a properly
configured timer).

Put in a separate file all the code that is specific to perf: create and
manage events, stop and start the detector. This perf-specific code is put
in the new file watchdog_hld_perf.c

The code generic code used to monitor the timers' thresholds, check
timestamps and detect hardlockups remains in watchdog_hld.c

Functions and variables are simply relocated to a new file. No functional
changes were made.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: "Luis R. Rodriguez" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 kernel/Makefile|   2 +-
 kernel/watchdog_hld.c  | 162 
 kernel/watchdog_hld_perf.c | 182 +
 3 files changed, 183 insertions(+), 163 deletions(-)
 create mode 100644 kernel/watchdog_hld_perf.c

diff --git a/kernel/Makefile b/kernel/Makefile
index f85ae5d..0a0d86d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -85,7 +85,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o watchdog_hld_perf.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 28a00c3..96615a2 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,12 +22,8 @@
 
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
-static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
-static DEFINE_PER_CPU(struct perf_event *, dead_event);
-static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -98,14 +94,6 @@ static inline bool watchdog_check_timestamp(void)
 }
 #endif
 
-static struct perf_event_attr wd_hw_attr = {
-   .type   = PERF_TYPE_HARDWARE,
-   .config = PERF_COUNT_HW_CPU_CYCLES,
-   .size   = sizeof(struct perf_event_attr),
-   .pinned = 1,
-   .disabled   = 1,
-};
-
 void inspect_for_hardlockups(struct pt_regs *regs)
 {
if (__this_cpu_read(watchdog_nmi_touch) == true) {
@@ -155,153 +143,3 @@ void inspect_for_hardlockups(struct pt_regs *regs)
__this_cpu_write(hard_watchdog_warn, false);
return;
 }
-
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-  struct perf_sample_data *data,
-  struct pt_regs *regs)
-{
-   /* Ensure the watchdog never gets throttled */
-   event->hw.interrupts = 0;
-   inspect_for_hardlockups(regs);
-}
-
-static int hardlockup_detector_event_create(void)
-{
-   unsigned int cpu = smp_processor_id();
-   struct perf_event_attr *wd_attr;
-   struct perf_event *evt;
-
-   wd_attr = _hw_attr;
-   wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
-
-   /* Try to register using hardware perf events */
-   evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
-  watchdog_overflow_callback, 
NULL);
-   if (IS_ERR(evt)) {
-   pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
-   PTR_ERR(evt));
-   return PTR_ERR(evt);
-   }
-   this_cpu_write(watchdog_ev, evt);
-   return 0;
-}
-
-/**
- * hardlockup_detector_perf_enable - Enable the local event
- */
-static void hardlockup_detector_perf_enable(void)
-{
-   if (hardlockup_detector_event_create())
-   return;
-
-   /* use original value for check */
-   if (!atomic_fetch_inc(_cpus))
-   pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
-
-   perf_event_enable(this_cpu_read(watchdog_ev));
-}
-
-/**
- * hardlockup_detector_perf_disable - Disable the local event
- */
-static void hardlockup_detector_perf_disable(void)
-{
-   struct perf_event *event = this_cpu_read(watchdog_ev);
-
-  

[RFC PATCH 15/23] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask

2018-06-12 Thread Ricardo Neri
Implementations of NMI watchdogs that use a single piece of hardware to
monitor all the CPUs in the system (as opposed to per-CPU implementations
such as perf) need to know which CPUs the watchdog is allowed to monitor.
In this manner, non-maskable interrupts are directed only to the monitored
CPUs.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: iommu@lists.linux-foundation.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h | 1 +
 kernel/watchdog.c   | 7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e61b441..e608762 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -77,6 +77,7 @@ static inline void reset_hung_task_detector(void) { }
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
+extern struct cpumask *watchdog_get_allowed_cpumask(void);
 extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5057376..b94bbe3 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -50,7 +50,7 @@ int __read_mostly nmi_watchdog_available;
 
 static struct nmi_watchdog_ops *nmi_wd_ops;
 
-struct cpumask watchdog_allowed_mask __read_mostly;
+static struct cpumask watchdog_allowed_mask __read_mostly;
 
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(_cpumask);
@@ -98,6 +98,11 @@ static int __init hardlockup_all_cpu_backtrace_setup(char 
*str)
 }
 __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 # endif /* CONFIG_SMP */
+
+struct cpumask *watchdog_get_allowed_cpumask(void)
+{
+   return _allowed_mask;
+}
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /*
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 11/23] x86/hpet: Configure the timer used by the hardlockup detector

2018-06-12 Thread Ricardo Neri
Implement the initial configuration of the timer to be used by the
hardlockup detector. The main focus of this configuration is to provide an
interrupt for the timer.

Two types of interrupt can be assigned to the timer. First, attempt to
assign a message-signaled interrupt. This implies creating the HPET MSI
domain; only if it was not created when HPET timers are used for event
timers. The data structures needed to allocate the MSI interrupt in the
domain are also created.

If message-signaled interrupts cannot be used, assign a legacy IO APIC
interrupt via the ACPI Global System Interrupts.

The resulting interrupt configuration, along with the timer instance, and
frequency are then made available to the hardlockup detector in a struct
via the new function hpet_hardlockup_detector_assign_timer().

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  16 +++
 arch/x86/kernel/hpet.c  | 112 +++-
 2 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 9fd112a..33309b7 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -118,6 +118,22 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler 
handler);
 
 #endif /* CONFIG_HPET_EMULATE_RTC */
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
+struct hpet_hld_data {
+   u32 num;
+   u32 irq;
+   u32 flags;
+   u64 ticks_per_second;
+};
+
+extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
+#else
+static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{
+   return NULL;
+}
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_HPET */
+
 #else /* CONFIG_HPET_TIMER */
 
 static inline int hpet_enable(void) { return 0; }
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 99d4972..fda6e19 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +37,7 @@ bool  hpet_msi_disable;
 
 #ifdef CONFIG_PCI_MSI
 static unsigned inthpet_num_timers;
+static struct irq_domain   *hpet_domain;
 #endif
 static void __iomem*hpet_virt_address;
 
@@ -177,6 +179,115 @@ do {  
\
_hpet_print_config(__func__, __LINE__); \
 } while (0)
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
+static
+int hpet_hardlockup_detector_assign_legacy_irq(struct hpet_hld_data *hdata)
+{
+   unsigned long v;
+   int gsi, hwirq;
+
+   /* Obtain interrupt pins that can be used by this timer. */
+   v = hpet_readq(HPET_Tn_CFG(HPET_WD_TIMER_NR));
+   v = (v & Tn_INT_ROUTE_CAP_MASK) >> Tn_INT_ROUTE_CAP_SHIFT;
+
+   /*
+* In PIC mode, skip IRQ0-4, IRQ6-9, IRQ12-15 which is always used by
+* legacy device. In IO APIC mode, we skip all the legacy IRQS.
+*/
+   if (acpi_irq_model == ACPI_IRQ_MODEL_PIC)
+   v &= ~0xf3df;
+   else
+   v &= ~0x;
+
+   for_each_set_bit(hwirq, , HPET_MAX_IRQ) {
+   if (hwirq >= NR_IRQS) {
+   hwirq = HPET_MAX_IRQ;
+   break;
+   }
+
+   gsi = acpi_register_gsi(NULL, hwirq, ACPI_LEVEL_SENSITIVE,
+   ACPI_ACTIVE_LOW);
+   if (gsi > 0)
+   break;
+   }
+
+   if (hwirq >= HPET_MAX_IRQ)
+   return -ENODEV;
+
+   hdata->irq = hwirq;
+   return 0;
+}
+
+static int hpet_hardlockup_detector_assign_msi_irq(struct hpet_hld_data *hdata)
+{
+   struct hpet_dev *hdev;
+   int hwirq;
+
+   if (hpet_msi_disable)
+   return -ENODEV;
+
+   hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
+   if (!hdev)
+   return -ENOMEM;
+
+   hdev->flags |= HPET_DEV_FSB_CAP;
+   hdev->num = hdata->num;
+   sprintf(hdev->name, "hpet_hld");
+
+   /* Domain may exist if CPU does not have Always-Running APIC Timers. */
+   if (!hpet_domain) {
+   hpet_domain = hpet_create_irq_domain(hpet_blockid);
+   if (!hpet_domain)
+   return -EPERM;
+   }
+
+   hwirq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
+   if (hwirq <= 0) {
+   kfree(hdev);
+   return -ENODEV;
+   }
+
+   hdata->irq = hwirq;
+   hdata->flags |= HPET_DEV_FSB_CAP;
+
+   hdev->irq = hwirq;
+
+   return 0;
+}
+
+struct hpet_hld_data 

[RFC PATCH 13/23] watchdog/hardlockup: Define a generic function to detect hardlockups

2018-06-12 Thread Ricardo Neri
The procedure to detect hardlockups is independent of the underlying
mechanism that generated the non-maskable interrupt used to drive the
detector. Thus, it can be put in a separate, generic function. In this
manner, it can be invoked by various implementations of the NMI watchdog.

For this purpose, move the bulk of watchdog_overflow_callback() to the
new function inspect_for_hardlockups(). This function can then be called
from the applicable NMI handlers.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: "Luis R. Rodriguez" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h   |  1 +
 kernel/watchdog_hld.c | 18 +++---
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index d3f5d55f..e61b441 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -223,6 +223,7 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 void __user *, size_t *, loff_t *);
+void inspect_for_hardlockups(struct pt_regs *regs);
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #include 
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 036cb0a..28a00c3 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,14 +106,8 @@ static struct perf_event_attr wd_hw_attr = {
.disabled   = 1,
 };
 
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-  struct perf_sample_data *data,
-  struct pt_regs *regs)
+void inspect_for_hardlockups(struct pt_regs *regs)
 {
-   /* Ensure the watchdog never gets throttled */
-   event->hw.interrupts = 0;
-
if (__this_cpu_read(watchdog_nmi_touch) == true) {
__this_cpu_write(watchdog_nmi_touch, false);
return;
@@ -162,6 +156,16 @@ static void watchdog_overflow_callback(struct perf_event 
*event,
return;
 }
 
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+  struct perf_sample_data *data,
+  struct pt_regs *regs)
+{
+   /* Ensure the watchdog never gets throttled */
+   event->hw.interrupts = 0;
+   inspect_for_hardlockups(regs);
+}
+
 static int hardlockup_detector_event_create(void)
 {
unsigned int cpu = smp_processor_id();
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations

2018-06-12 Thread Ricardo Neri
Instead of exposing individual functions for the operations of the NMI
watchdog, define a common interface that can be used across multiple
implementations.

The struct nmi_watchdog_ops is defined for such operations. These initial
definitions include the enable, disable, start, stop, and cleanup
operations.

Only a single NMI watchdog can be used in the system. The operations of
this NMI watchdog are accessed via the new variable nmi_wd_ops. This
variable is set to point the operations of the first NMI watchdog that
initializes successfully. Even though at this moment, the only available
NMI watchdog is the perf-based hardlockup detector. More implementations
can be added in the future.

While introducing this new struct for the NMI watchdog operations, convert
the perf-based NMI watchdog to use these operations.

The functions hardlockup_detector_perf_restart() and
hardlockup_detector_perf_stop() are special. They are not regular watchdog
operations; they are used to work around hardware bugs. Thus, they are not
used for the start and stop operations. Furthermore, the perf-based NMI
watchdog does not need to implement such operations. They are intended to
globally start and stop the NMI watchdog; the perf-based NMI
watchdog is implemented on a per-CPU basis.

Currently, when perf-based hardlockup detector is not selected at build
time, a dummy hardlockup_detector_perf_init() is used. The return value
of this function depends on CONFIG_HAVE_NMI_WATCHDOG. This behavior is
conserved by defining using the set of NMI watchdog operations structure
hardlockup_detector_noop. These dummy operations are used when no hard-
lockup detector is used or fails to initialize.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Babu Moger 
Cc: "David S. Miller" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: "Luis R. Rodriguez" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 include/linux/nmi.h   | 39 +++--
 kernel/watchdog.c | 54 +--
 kernel/watchdog_hld.c | 16 +++
 3 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index b8d868d..d3f5d55f 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -92,24 +92,43 @@ static inline void hardlockup_detector_disable(void) {}
 extern void arch_touch_nmi_watchdog(void);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
-extern void hardlockup_detector_perf_disable(void);
-extern void hardlockup_detector_perf_enable(void);
-extern void hardlockup_detector_perf_cleanup(void);
-extern int hardlockup_detector_perf_init(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-static inline void hardlockup_detector_perf_disable(void) { }
-static inline void hardlockup_detector_perf_enable(void) { }
-static inline void hardlockup_detector_perf_cleanup(void) { }
 # if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
 static inline void arch_touch_nmi_watchdog(void) {}
-# else
-static inline int hardlockup_detector_perf_init(void) { return 0; }
 # endif
 #endif
 
+/**
+ * struct nmi_watchdog_ops - Operations performed by NMI watchdogs
+ * @init:  Initialize and configure the hardware resources of the
+ * NMI watchdog.
+ * @enable:Enable (i.e., monitor for hardlockups) the NMI watchdog
+ * in the CPU in which the function is executed.
+ * @disable:   Disable (i.e., do not monitor for hardlockups) the NMI
+ * in the CPU in which the function is executed.
+ * @start: Start the the NMI watchdog in all CPUs. Used after the
+ * parameters of the watchdog are updated. Optional if
+ * such updates does not impact operation the NMI watchdog.
+ * @stop:  Stop the the NMI watchdog in all CPUs. Used before the
+ * parameters of the watchdog are updated. Optional if
+ * such updates does not impact the NMI watchdog.
+ * @cleanup:   Cleanup unneeded data structures of the NMI watchdog.
+ * Used after updating the parameters of the watchdog.
+ * Optional no cleanup is needed.
+ */
+struct nmi_watchdog_ops {
+   int (*init)(void);
+   void(*enable)(void);
+   void(*disable)(void);
+   void(*start)(void);
+  

[RFC PATCH 10/23] x86/hpet: Relocate flag definitions to a header file

2018-06-12 Thread Ricardo Neri
Users of HPET timers (such as the hardlockup detector) need the definitions
of these flags to interpret the configuration of a timer as passed by
platform code.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h | 6 ++
 arch/x86/kernel/hpet.c  | 6 --
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 3266796..9fd112a 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -64,6 +64,12 @@
 /* Timer used for the hardlockup detector */
 #define HPET_WD_TIMER_NR 2
 
+#define HPET_DEV_USED_BIT  2
+#define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
+#define HPET_DEV_VALID 0x8
+#define HPET_DEV_FSB_CAP   0x1000
+#define HPET_DEV_PERI_CAP  0x2000
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b03faee..99d4972 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -24,12 +24,6 @@
NSEC = 10^-9 */
 #define FSEC_PER_NSEC  100L
 
-#define HPET_DEV_USED_BIT  2
-#define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
-#define HPET_DEV_VALID 0x8
-#define HPET_DEV_FSB_CAP   0x1000
-#define HPET_DEV_PERI_CAP  0x2000
-
 #define HPET_MIN_CYCLES128
 #define HPET_MIN_PROG_DELTA(HPET_MIN_CYCLES + (HPET_MIN_CYCLES >> 
1))
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 09/23] x86/hpet: Reserve timer for the HPET hardlockup detector

2018-06-12 Thread Ricardo Neri
HPET timer 2 will be used to drive the HPET-based hardlockup detector.
Reserve such timer to ensure it cannot be used by user space programs or
clock events.

When looking for MSI-capable timers for clock events, skip timer 2 if
the HPET hardlockup detector is selected.

Also, do not assign an IO APIC pin to timer 2 of the HPET. A subsequent
changeset will handle the interrupt setup of the timer used for the
hardlockup detector.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  3 +++
 arch/x86/kernel/hpet.c  | 19 ---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 9e0afde..3266796 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -61,6 +61,9 @@
  */
 #define HPET_MIN_PERIOD10UL
 
+/* Timer used for the hardlockup detector */
+#define HPET_WD_TIMER_NR 2
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3fa1d3f..b03faee 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -185,7 +185,8 @@ do {
\
 
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
- * timer 0 and timer 1 in case of RTC emulation.
+ * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
+ * the HPET-based hardlockup detector is used.
  */
 #ifdef CONFIG_HPET
 
@@ -195,7 +196,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
 {
struct hpet __iomem *hpet = hpet_virt_address;
struct hpet_timer __iomem *timer = >hpet_timers[2];
-   unsigned int nrtimers, i;
+   unsigned int nrtimers, i, start_timer;
struct hpet_data hd;
 
nrtimers = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT) + 1;
@@ -210,6 +211,13 @@ static void hpet_reserve_platform_timers(unsigned int id)
hpet_reserve_timer(, 1);
 #endif
 
+   if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET)) {
+   hpet_reserve_timer(, HPET_WD_TIMER_NR);
+   start_timer = HPET_WD_TIMER_NR + 1;
+   } else {
+   start_timer = HPET_WD_TIMER_NR;
+   }
+
/*
 * NOTE that hd_irq[] reflects IOAPIC input pins (LEGACY_8254
 * is wrong for i8259!) not the output IRQ.  Many BIOS writers
@@ -218,7 +226,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
hd.hd_irq[0] = HPET_LEGACY_8254;
hd.hd_irq[1] = HPET_LEGACY_RTC;
 
-   for (i = 2; i < nrtimers; timer++, i++) {
+   for (i = start_timer; i < nrtimers; timer++, i++) {
hd.hd_irq[i] = (readl(>hpet_config) &
Tn_INT_ROUTE_CNF_MASK) >> Tn_INT_ROUTE_CNF_SHIFT;
}
@@ -630,6 +638,11 @@ static void hpet_msi_capability_lookup(unsigned int 
start_timer)
struct hpet_dev *hdev = _devs[num_timers_used];
unsigned int cfg = hpet_readl(HPET_Tn_CFG(i));
 
+   /* Do not use timer reserved for the HPET watchdog. */
+   if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET) &&
+   i == HPET_WD_TIMER_NR)
+   continue;
+
/* Only consider HPET timer with MSI support */
if (!(cfg & HPET_TN_FSB_CAP))
continue;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 08/23] x86/hpet: Calculate ticks-per-second in a separate function

2018-06-12 Thread Ricardo Neri
It is easier to compute the expiration times of an HPET timer by using
its frequency (i.e., the number of times it ticks in a second) than its
period, as given in the capabilities register.

In addition to the HPET char driver, the HPET-based hardlockup detector
will also need to know the timer's frequency. Thus, create a common
function that both can use.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Clemens Ladisch 
Cc: Arnd Bergmann 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 drivers/char/hpet.c  | 31 +--
 include/linux/hpet.h |  1 +
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index be426eb..1c9584a 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -838,6 +838,29 @@ static unsigned long hpet_calibrate(struct hpets *hpetp)
return ret;
 }
 
+u64 hpet_get_ticks_per_sec(u64 hpet_caps)
+{
+   u64 ticks_per_sec, period;
+
+   period = (hpet_caps & HPET_COUNTER_CLK_PERIOD_MASK) >>
+HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
+
+   /*
+* The frequency is the reciprocal of the period. The period is given
+* femtoseconds per second. Thus, prepare a dividend to obtain the
+* frequency in ticks per second.
+*/
+
+   /* 10^15 femtoseconds per second */
+   ticks_per_sec = 1000uLL;
+   ticks_per_sec += period >> 1; /* round */
+
+   /* The quotient is put in the dividend. We drop the remainder. */
+   do_div(ticks_per_sec, period);
+
+   return ticks_per_sec;
+}
+
 int hpet_alloc(struct hpet_data *hdp)
 {
u64 cap, mcfg;
@@ -847,7 +870,6 @@ int hpet_alloc(struct hpet_data *hdp)
size_t siz;
struct hpet __iomem *hpet;
static struct hpets *last;
-   unsigned long period;
unsigned long long temp;
u32 remainder;
 
@@ -883,6 +905,8 @@ int hpet_alloc(struct hpet_data *hdp)
 
cap = readq(>hpet_cap);
 
+   temp = hpet_get_ticks_per_sec(cap);
+
ntimer = ((cap & HPET_NUM_TIM_CAP_MASK) >> HPET_NUM_TIM_CAP_SHIFT) + 1;
 
if (hpetp->hp_ntimer != ntimer) {
@@ -899,11 +923,6 @@ int hpet_alloc(struct hpet_data *hdp)
 
last = hpetp;
 
-   period = (cap & HPET_COUNTER_CLK_PERIOD_MASK) >>
-   HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
-   temp = 1000uLL; /* 10^15 femtoseconds per second */
-   temp += period >> 1; /* round */
-   do_div(temp, period);
hpetp->hp_tick_freq = temp; /* ticks per second */
 
printk(KERN_INFO "hpet%d: at MMIO 0x%lx, IRQ%s",
diff --git a/include/linux/hpet.h b/include/linux/hpet.h
index 8604564..e7b36bcf4 100644
--- a/include/linux/hpet.h
+++ b/include/linux/hpet.h
@@ -107,5 +107,6 @@ static inline void hpet_reserve_timer(struct hpet_data *hd, 
int timer)
 }
 
 int hpet_alloc(struct hpet_data *);
+u64 hpet_get_ticks_per_sec(u64 hpet_caps);
 
 #endif /* !__HPET__ */
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 07/23] x86/hpet: Expose more functions to read and write registers

2018-06-12 Thread Ricardo Neri
Some of the registers in the HPET hardware have a width of 64 bits. 64-bit
access functions are needed mostly to read the counter and write the
comparator in a single read or write. Also, 64-bit accesses can be used to
to read parameters located in the higher bits of some registers (such as
the timer period and the IO APIC pins that can be asserted by the timer)
without the need of masking and shifting the register values.

64-bit read and write functions are added. These functions, along with the
existing hpet_writel(), are exposed via the HPET header to be used by other
kernel subsystems.

Thus far, the only consumer of these functions will the HPET-based
hardlockup detector, which will only be available in 64-bit builds. Thus,
the 64-bit access functions are wrapped in CONFIG_X86_64.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Philippe Ombredanne 
Cc: Kate Stewart 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h | 10 ++
 arch/x86/kernel/hpet.c  | 12 +++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 67385d5..9e0afde 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -72,6 +72,11 @@ extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
+extern void hpet_writel(unsigned int d, unsigned int a);
+#ifdef CONFIG_X86_64
+extern unsigned long hpet_readq(unsigned int a);
+extern void hpet_writeq(unsigned long d, unsigned int a);
+#endif
 extern void force_hpet_resume(void);
 
 struct irq_data;
@@ -109,6 +114,11 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler 
handler);
 static inline int hpet_enable(void) { return 0; }
 static inline int is_hpet_enabled(void) { return 0; }
 #define hpet_readl(a) 0
+#define hpet_writel(d, a)
+#ifdef CONFIG_X86_64
+#define hpet_readq(a) 0
+#define hpet_writeq(d, a)
+#endif
 #define default_setup_hpet_msi NULL
 
 #endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 8ce4212..3fa1d3f 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -64,12 +64,22 @@ inline unsigned int hpet_readl(unsigned int a)
return readl(hpet_virt_address + a);
 }
 
-static inline void hpet_writel(unsigned int d, unsigned int a)
+inline void hpet_writel(unsigned int d, unsigned int a)
 {
writel(d, hpet_virt_address + a);
 }
 
 #ifdef CONFIG_X86_64
+inline unsigned long hpet_readq(unsigned int a)
+{
+   return readq(hpet_virt_address + a);
+}
+
+inline void hpet_writeq(unsigned long d, unsigned int a)
+{
+   writeq(d, hpet_virt_address + a);
+}
+
 #include 
 #endif
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI

2018-06-12 Thread Ricardo Neri
Certain interrupt controllers (such as APIC) are capable of delivering
interrupts as non-maskable. Likewise, drivers or subsystems (e.g., the
hardlockup detector) might be interested in requesting a non-maskable
interrupt. The new flag IRQF_DELIVER_AS_NMI serves this purpose.

When setting up an interrupt, non-maskable delivery will be set in the
interrupt state data only if supported by the underlying interrupt
controller chips.

Interrupt controller chips can declare that they support non-maskable
delivery by using the new flag IRQCHIP_CAN_DELIVER_AS_NMI.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Daniel Lezcano 
Cc: Andrew Morton 
Cc: "Levin, Alexander (Sasha Levin)" 
Cc: Randy Dunlap 
Cc: Masami Hiramatsu 
Cc: Marc Zyngier 
Cc: Bartosz Golaszewski 
Cc: Doug Berger 
Cc: Palmer Dabbelt 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 include/linux/interrupt.h |  3 +++
 include/linux/irq.h   |  3 +++
 kernel/irq/manage.c   | 22 +-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 5426627..dbc5e02 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -61,6 +61,8 @@
  *interrupt handler after suspending interrupts. For system
  *wakeup devices users need to implement wakeup detection in
  *their interrupt handlers.
+ * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, 
if
+ *supported by the chip.
  */
 #define IRQF_SHARED0x0080
 #define IRQF_PROBE_SHARED  0x0100
@@ -74,6 +76,7 @@
 #define IRQF_NO_THREAD 0x0001
 #define IRQF_EARLY_RESUME  0x0002
 #define IRQF_COND_SUSPEND  0x0004
+#define IRQF_DELIVER_AS_NMI0x0008
 
 #define IRQF_TIMER (__IRQF_TIMER | IRQF_NO_SUSPEND | 
IRQF_NO_THREAD)
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 7271a2c..d2520ae 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -515,6 +515,8 @@ struct irq_chip {
  * IRQCHIP_SKIP_SET_WAKE:  Skip chip.irq_set_wake(), for this irq chip
  * IRQCHIP_ONESHOT_SAFE:   One shot does not require mask/unmask
  * IRQCHIP_EOI_THREADED:   Chip requires eoi() on unmask in threaded mode
+ * IRQCHIP_CAN_DELIVER_AS_NMI  Chip can deliver interrupts it receives as non-
+ * maskable.
  */
 enum {
IRQCHIP_SET_TYPE_MASKED = (1 <<  0),
@@ -524,6 +526,7 @@ enum {
IRQCHIP_SKIP_SET_WAKE   = (1 <<  4),
IRQCHIP_ONESHOT_SAFE= (1 <<  5),
IRQCHIP_EOI_THREADED= (1 <<  6),
+   IRQCHIP_CAN_DELIVER_AS_NMI  = (1 <<  7),
 };
 
 #include 
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e3336d9..d058aa8 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1137,7 +1137,7 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
 {
struct irqaction *old, **old_ptr;
unsigned long flags, thread_mask = 0;
-   int ret, nested, shared = 0;
+   int ret, nested, shared = 0, deliver_as_nmi = 0;
 
if (!desc)
return -EINVAL;
@@ -1156,6 +1156,16 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
if (!(new->flags & IRQF_TRIGGER_MASK))
new->flags |= irqd_get_trigger_type(>irq_data);
 
+   /* Only deliver as non-maskable interrupt if supported by chip. */
+   if (new->flags & IRQF_DELIVER_AS_NMI) {
+   if (desc->irq_data.chip->flags & IRQCHIP_CAN_DELIVER_AS_NMI) {
+   irqd_set_deliver_as_nmi(>irq_data);
+   deliver_as_nmi = 1;
+   } else {
+   return -EINVAL;
+   }
+   }
+
/*
 * Check whether the interrupt nests into another interrupt
 * thread.
@@ -1166,6 +1176,13 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
ret = -EINVAL;
goto out_mput;
}
+
+   /* Don't allow nesting if interrupt will be delivered as NMI. */
+   if (deliver_as_nmi) {
+   ret = -EINVAL;
+   goto out_mput;
+   }
+
/*
 * Replace the primary handler which was provided from
 * the driver for non nested interrupt handling by the
@@ -1186,6 +1203,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
 * thread.
 */
if (new->thread_fn && !nested) {
+   if (deliver_as_nmi)
+   goto out_mput;
+
ret = setup_irq_thread(new, irq, false);
if (ret)
goto out_mput;
-- 
2.7.4


[RFC PATCH 06/23] x86/ioapic: Add support for IRQCHIP_CAN_DELIVER_AS_NMI with interrupt remapping

2018-06-12 Thread Ricardo Neri
Even though there is a delivery mode field at the entries of an IO APIC's
redirection table, the documentation of the majority of the IO APICs
explicitly states that interrupt delivery as non-maskable is not supported.
Thus,

However, when using an IO APIC in combination with the Intel VT-d interrupt
remapping functionality, the delivery of the interrupt to the CPU is
handled by the remapping hardware. In such a case, the interrupt can be
delivered as non maskable.

Thus, add the IRQCHIP_CAN_DELIVER_AS_NMI flag only when used in combination
with interrupt remapping.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Juergen Gross 
Cc: Baoquan He 
Cc: "Eric W. Biederman" 
Cc: Dou Liyang 
Cc: Jan Kiszka 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/apic/io_apic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 10a20f8..39de91b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1911,7 +1911,8 @@ static struct irq_chip ioapic_ir_chip __read_mostly = {
.irq_eoi= ioapic_ir_ack_level,
.irq_set_affinity   = ioapic_set_affinity,
.irq_retrigger  = irq_chip_retrigger_hierarchy,
-   .flags  = IRQCHIP_SKIP_SET_WAKE,
+   .flags  = IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static inline void init_IO_APIC_traps(void)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 05/23] x86/msi: Add support for IRQCHIP_CAN_DELIVER_AS_NMI

2018-06-12 Thread Ricardo Neri
As per the Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3 Section 10.11.2, the delivery mode field of the interrupt message
can be set to configure as non-maskable. Declare support to deliver non-
maskable interrupts by adding IRQCHIP_CAN_DELIVER_AS_NMI.

When composing the interrupt message, the delivery mode is obtained from
the configuration of the interrupt data.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Dou Liyang 
Cc: Juergen Gross 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/apic/msi.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 12202ac..68b6a04 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -29,6 +29,9 @@ static void irq_msi_compose_msg(struct irq_data *data, struct 
msi_msg *msg)
 {
struct irq_cfg *cfg = irqd_cfg(data);
 
+   if (irqd_deliver_as_nmi(data))
+   cfg->delivery_mode = dest_NMI;
+
msg->address_hi = MSI_ADDR_BASE_HI;
 
if (x2apic_enabled())
@@ -297,7 +300,7 @@ static struct irq_chip hpet_msi_controller __ro_after_init 
= {
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_compose_msi_msg = irq_msi_compose_msg,
.irq_write_msi_msg = hpet_msi_write_msg,
-   .flags = IRQCHIP_SKIP_SET_WAKE,
+   .flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 04/23] iommu/vt-d/irq_remapping: Add support for IRQCHIP_CAN_DELIVER_AS_NMI

2018-06-12 Thread Ricardo Neri
The Intel IOMMU is capable of delivering remapped interrupts as non-
maskable. Add the IRQCHIP_CAN_DELIVER_AS_NMI flag to its irq_chip
structure to declare this capability. The delivery mode of each interrupt
can be set separately.

By default, the deliver mode is taken from the configuration field of the
interrupt data. If non-maskable delivery is requested in the interrupt
state flags, the respective entry in the remapping table is updated.

When remapping an interrupt from an IO APIC, modify the delivery
field in the interrupt remapping table entry. When remapping an MSI
interrupt, simply update the delivery mode when composing the message.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 drivers/iommu/intel_irq_remapping.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 9f3a04d..b6cf7c4 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1128,10 +1128,14 @@ static void intel_ir_reconfigure_irte(struct irq_data 
*irqd, bool force)
struct irte *irte = _data->irte_entry;
struct irq_cfg *cfg = irqd_cfg(irqd);
 
+   if (irqd_deliver_as_nmi(irqd))
+   cfg->delivery_mode = dest_NMI;
+
/*
 * Atomically updates the IRTE with the new destination, vector
 * and flushes the interrupt entry cache.
 */
+   irte->dlvry_mode = cfg->delivery_mode;
irte->vector = cfg->vector;
irte->dest_id = IRTE_DEST(cfg->dest_apicid);
 
@@ -1182,6 +1186,9 @@ static void intel_ir_compose_msi_msg(struct irq_data 
*irq_data,
 {
struct intel_ir_data *ir_data = irq_data->chip_data;
 
+   if (irqd_deliver_as_nmi(irq_data))
+   ir_data->irte_entry.dlvry_mode = dest_NMI;
+
*msg = ir_data->msi_entry;
 }
 
@@ -1227,6 +1234,7 @@ static struct irq_chip intel_ir_chip = {
.irq_set_affinity   = intel_ir_set_affinity,
.irq_compose_msi_msg= intel_ir_compose_msi_msg,
.irq_set_vcpu_affinity  = intel_ir_set_vcpu_affinity,
+   .flags  = IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 01/23] x86/apic: Add a parameter for the APIC delivery mode

2018-06-12 Thread Ricardo Neri
Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. Also, update the
configuration of the delivery mode in the IO APIC, the MSI APIC and the
Intel interrupt remapping driver to use this new per-interrupt member to
configure their respective interrupt tables.

In order to keep the current behavior, initialize the delivery mode of
each interrupt with the with the delivery mode of the APIC driver in use
when the interrupt data is allocated.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Joerg Roedel 
Cc: Juergen Gross 
Cc: Bjorn Helgaas 
Cc: Wincy Van 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: "Eric W. Biederman" 
Cc: Baoquan He 
Cc: Dou Liyang 
Cc: Jan Kiszka 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hw_irq.h   |  5 +++--
 arch/x86/include/asm/msidef.h   |  3 +++
 arch/x86/kernel/apic/io_apic.c  |  2 +-
 arch/x86/kernel/apic/msi.c  |  2 +-
 arch/x86/kernel/apic/vector.c   |  8 
 arch/x86/platform/uv/uv_irq.c   |  2 +-
 drivers/iommu/intel_irq_remapping.c | 10 +-
 7 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 32e666e..c024e59 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -117,8 +117,9 @@ struct irq_alloc_info {
 };
 
 struct irq_cfg {
-   unsigned intdest_apicid;
-   unsigned intvector;
+   unsigned intdest_apicid;
+   unsigned intvector;
+   enum ioapic_irq_destination_types   delivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8cc..6aef434 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -16,6 +16,9 @@
 MSI_DATA_VECTOR_MASK)
 
 #define MSI_DATA_DELIVERY_MODE_SHIFT   8
+#define MSI_DATA_DELIVERY_MODE_MASK0x0700
+#define MSI_DATA_DELIVERY_MODE(dm) (((dm) << MSI_DATA_DELIVERY_MODE_SHIFT) 
& \
+MSI_DATA_DELIVERY_MODE_MASK)
 #define  MSI_DATA_DELIVERY_FIXED   (0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI  (1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7553819..10a20f8 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2887,8 +2887,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct 
mp_chip_data *data,
   struct IO_APIC_route_entry *entry)
 {
memset(entry, 0, sizeof(*entry));
-   entry->delivery_mode = apic->irq_delivery_mode;
entry->dest_mode = apic->irq_dest_mode;
+   entry->delivery_mode = cfg->delivery_mode;
entry->dest  = cfg->dest_apicid;
entry->vector= cfg->vector;
entry->trigger   = data->trigger;
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index ce503c9..12202ac 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -45,7 +45,7 @@ static void irq_msi_compose_msg(struct irq_data *data, struct 
msi_msg *msg)
msg->data =
MSI_DATA_TRIGGER_EDGE |
MSI_DATA_LEVEL_ASSERT |
-   MSI_DATA_DELIVERY_FIXED |
+   MSI_DATA_DELIVERY_MODE(cfg->delivery_mode) |
MSI_DATA_VECTOR(cfg->vector);
 }
 
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb6f7a2..dfe0a2a 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -547,6 +547,14 @@ static int x86_vector_alloc_irqs(struct irq_domain 
*domain, unsigned int virq,
irqd->chip_data = apicd;
irqd->hwirq = virq + i;
irqd_set_single_target(irqd);
+
+   /*
+* Initialize the delivery mode of this irq to match
+* the default delivery mode of the APIC. This could be
+* changed later when the interrupt is activated.
+*/
+apicd->hw_irq_cfg.delivery_mode = apic->irq_delivery_mode;
+
/*
 * Legacy vectors are already assigned when the IOAPIC
 * takes them over. They stay on the same vector. This is
diff --git a/arch/x86/platform/uv/uv_irq.c 

[RFC PATCH 02/23] genirq: Introduce IRQD_DELIVER_AS_NMI

2018-06-12 Thread Ricardo Neri
Certain interrupt controllers (e.g., APIC) are capable of delivering
interrupts to the CPU as non-maskable. Add the new IRQD_DELIVER_AS_NMI
interrupt state flag. The purpose of this flag is to communicate to the
underlying irqchip whether the interrupt must be delivered in this manner.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: Marc Zyngier 
Cc: Bartosz Golaszewski 
Cc: Doug Berger 
Cc: Palmer Dabbelt 
Cc: Randy Dunlap 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri 
---
 include/linux/irq.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 65916a3..7271a2c 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -208,6 +208,7 @@ struct irq_data {
  * IRQD_SINGLE_TARGET  - IRQ allows only a single affinity target
  * IRQD_DEFAULT_TRIGGER_SET- Expected trigger already been set
  * IRQD_CAN_RESERVE- Can use reservation mode
+ * IRQD_DELIVER_AS_NMI - Deliver this interrupt as non-maskable
  */
 enum {
IRQD_TRIGGER_MASK   = 0xf,
@@ -230,6 +231,7 @@ enum {
IRQD_SINGLE_TARGET  = (1 << 24),
IRQD_DEFAULT_TRIGGER_SET= (1 << 25),
IRQD_CAN_RESERVE= (1 << 26),
+   IRQD_DELIVER_AS_NMI = (1 << 27),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -389,6 +391,16 @@ static inline bool irqd_can_reserve(struct irq_data *d)
return __irqd_to_state(d) & IRQD_CAN_RESERVE;
 }
 
+static inline void irqd_set_deliver_as_nmi(struct irq_data *d)
+{
+   __irqd_to_state(d) |= IRQD_DELIVER_AS_NMI;
+}
+
+static inline bool irqd_deliver_as_nmi(struct irq_data *d)
+{
+   return __irqd_to_state(d) & IRQD_DELIVER_AS_NMI;
+}
+
 #undef __irqd_to_state
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-06-12 Thread Gary R Hook
Implement a skeleton framework for debugfs support in the AMD
IOMMU.  Add an AMD-specific Kconfig boolean that depends upon
general enablement of DebugFS in the IOMMU.

Signed-off-by: Gary R Hook 
---
 drivers/iommu/Kconfig |   12 
 drivers/iommu/Makefile|1 +
 drivers/iommu/amd_iommu_debugfs.c |   33 +
 drivers/iommu/amd_iommu_init.c|6 --
 drivers/iommu/amd_iommu_proto.h   |6 ++
 drivers/iommu/amd_iommu_types.h   |5 +
 6 files changed, 61 insertions(+), 2 deletions(-)
 create mode 100644 drivers/iommu/amd_iommu_debugfs.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f9af25ac409f..5a9cef113763 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -146,6 +146,18 @@ config AMD_IOMMU_V2
  hardware. Select this option if you want to use devices that support
  the PCI PRI and PASID interface.
 
+config AMD_IOMMU_DEBUGFS
+   bool "Enable AMD IOMMU internals in DebugFS"
+   depends on AMD_IOMMU && IOMMU_DEBUGFS
+   ---help---
+ !!!WARNING!!!  !!!WARNING!!!  !!!WARNING!!!  !!!WARNING!!!
+
+ DO NOT ENABLE THIS OPTION UNLESS YOU REALLY, -REALLY- KNOW WHAT YOU 
ARE DOING!!!
+ Exposes AMD IOMMU device internals in DebugFS.
+
+ This option is -NOT- intended for production environments, and should
+ not generally be enabled.
+
 # Intel IOMMU support
 config DMAR_TABLE
bool
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 74cfbc392862..47fd6ea9de2d 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU) += of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o
+obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
diff --git a/drivers/iommu/amd_iommu_debugfs.c 
b/drivers/iommu/amd_iommu_debugfs.c
new file mode 100644
index ..c6a5c737ef09
--- /dev/null
+++ b/drivers/iommu/amd_iommu_debugfs.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD IOMMU driver
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ *
+ * Author: Gary R Hook 
+ */
+
+#include 
+#include 
+#include 
+#include "amd_iommu_proto.h"
+#include "amd_iommu_types.h"
+
+static struct dentry *amd_iommu_debugfs;
+static DEFINE_MUTEX(amd_iommu_debugfs_lock);
+
+#defineMAX_NAME_LEN20
+
+void amd_iommu_debugfs_setup(struct amd_iommu *iommu)
+{
+   char name[MAX_NAME_LEN + 1];
+
+   mutex_lock(_iommu_debugfs_lock);
+   if (!amd_iommu_debugfs)
+   amd_iommu_debugfs = debugfs_create_dir("amd",
+  iommu_debugfs_dir);
+   mutex_unlock(_iommu_debugfs_lock);
+
+   snprintf(name, MAX_NAME_LEN, "iommu%02d", iommu->index);
+   iommu->debugfs = debugfs_create_dir(name, amd_iommu_debugfs);
+}
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 904c575d1677..031e6dbb8345 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2721,6 +2721,7 @@ int __init amd_iommu_enable_faulting(void)
  */
 static int __init amd_iommu_init(void)
 {
+   struct amd_iommu *iommu;
int ret;
 
ret = iommu_go_to_state(IOMMU_INITIALIZED);
@@ -2730,14 +2731,15 @@ static int __init amd_iommu_init(void)
disable_iommus();
free_iommu_resources();
} else {
-   struct amd_iommu *iommu;
-
uninit_device_table_dma();
for_each_iommu(iommu)
iommu_flush_all_caches(iommu);
}
}
 
+   for_each_iommu(iommu)
+   amd_iommu_debugfs_setup(iommu);
+
return ret;
 }
 
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index 640c286a0ab9..a8cd0296fb16 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -33,6 +33,12 @@ extern void amd_iommu_uninit_devices(void);
 extern void amd_iommu_init_notifier(void);
 extern int amd_iommu_init_api(void);
 
+#ifdef CONFIG_AMD_IOMMU_DEBUGFS
+void amd_iommu_debugfs_setup(struct amd_iommu *iommu);
+#else
+static inline void amd_iommu_debugfs_setup(struct amd_iommu *iommu) {}
+#endif
+
 /* Needed for interrupt remapping */
 extern int amd_iommu_prepare(void);
 extern int amd_iommu_enable(void);
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 986cbe0cc189..cfac9d842b0f 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -594,6 +594,11 @@ struct amd_iommu {
 
u32 flags;
volatile u64 __aligned(8) cmd_sem;
+
+#ifdef CONFIG_AMD_IOMMU_DEBUGFS
+   

[PATCH v9 1/2] iommu - Enable debugfs exposure of IOMMU driver internals

2018-06-12 Thread Gary R Hook
Provide base enablement for using debugfs to expose internal data of an
IOMMU driver. When called, create the /sys/kernel/debug/iommu directory.

Emit a strong warning at boot time to indicate that this feature is
enabled.

This function is called from iommu_init, and creates the initial DebugFS
directory. Drivers may then call iommu_debugfs_new_driver_dir() to
instantiate a device-specific directory to expose internal data.
It will return a pointer to the new dentry structure created in
/sys/kernel/debug/iommu, or NULL in the event of a failure.

Since the IOMMU driver can not be removed from the running system, there
is no need for an "off" function.

Signed-off-by: Gary R Hook 
---
 drivers/iommu/Kconfig |   10 ++
 drivers/iommu/Makefile|1 +
 drivers/iommu/iommu-debugfs.c |   66 +
 drivers/iommu/iommu.c |2 +
 include/linux/iommu.h |7 
 5 files changed, 86 insertions(+)
 create mode 100644 drivers/iommu/iommu-debugfs.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c76157e57f6b..f9af25ac409f 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -60,6 +60,16 @@ config IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
 
 endmenu
 
+config IOMMU_DEBUGFS
+   bool "Export IOMMU internals in DebugFS"
+   depends on DEBUG_FS
+   help
+ Allows exposure of IOMMU device internals. This option enables
+ the use of debugfs by IOMMU drivers as required. Devices can,
+ at initialization time, cause the IOMMU code to create a top-level
+ debug/iommu directory, and then populate a subdirectory with
+ entries as required.
+
 config IOMMU_IOVA
tristate
 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..74cfbc392862 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
diff --git a/drivers/iommu/iommu-debugfs.c b/drivers/iommu/iommu-debugfs.c
new file mode 100644
index ..3b1bf88fd1b0
--- /dev/null
+++ b/drivers/iommu/iommu-debugfs.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * IOMMU debugfs core infrastructure
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ *
+ * Author: Gary R Hook 
+ */
+
+#include 
+#include 
+#include 
+
+struct dentry *iommu_debugfs_dir;
+
+/**
+ * iommu_debugfs_setup - create the top-level iommu directory in debugfs
+ *
+ * Provide base enablement for using debugfs to expose internal data of an
+ * IOMMU driver. When called, this function creates the
+ * /sys/kernel/debug/iommu directory.
+ *
+ * Emit a strong warning at boot time to indicate that this feature is
+ * enabled.
+ *
+ * This function is called from iommu_init; drivers may then call
+ * iommu_debugfs_new_driver_dir() to instantiate a vendor-specific
+ * directory to be used to expose internal data.
+ */
+void iommu_debugfs_setup(void)
+{
+   if (!iommu_debugfs_dir) {
+   iommu_debugfs_dir = debugfs_create_dir("iommu", NULL);
+   pr_warn("\n");
+   
pr_warn("*\n");
+   pr_warn("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE 
NOTICE**\n");
+   pr_warn("** 
**\n");
+   pr_warn("**  IOMMU DebugFS SUPPORT HAS BEEN ENABLED IN THIS 
KERNEL  **\n");
+   pr_warn("** 
**\n");
+   pr_warn("** This means that this kernel is built to expose 
internal **\n");
+   pr_warn("** IOMMU data structures, which may compromise 
security on **\n");
+   pr_warn("** your system.
**\n");
+   pr_warn("** 
**\n");
+   pr_warn("** If you see this message and you are not debugging 
the   **\n");
+   pr_warn("** kernel, report this immediately to your vendor! 
**\n");
+   pr_warn("** 
**\n");
+   pr_warn("** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE 
NOTICE**\n");
+   
pr_warn("*\n");
+   }
+}
+
+/**
+ * iommu_debugfs_new_driver_dir - create a vendor directory under debugfs/iommu
+ * @vendor: name of the vendor-specific subdirectory to create
+ *
+ * This function is called by an IOMMU driver to create the top-level debugfs
+ * directory for that driver.
+ *
+ * Return: upon success, a pointer to 

[PATCH v9 0/2] Base enablement of IOMMU debugfs support

2018-06-12 Thread Gary R Hook
These patches create a top-level function, called at IOMMU
initialization, to create a debugfs directory for the IOMMU.
Under this directory drivers may create and populate
vendor-specific directories for their device internals.

Patch 1: general IOMMU enablement
Patch 2: basic AMD enablement to demonstrate linkage with patch 1

Introduce a new Kconfig parameter IOMMU_DEBUGFS to globally allow/disallow 
debugfs code to be built.

The Makefile structure is intended to allow the use of a single switch for 
turning on DebugFS.

Changes since v8:
 - Change Kconfig to use both an IOMMU boolean and a vendor-specific
   boolean; ensure big warning messages
 - Make the function that creates a vendor directory more concise

Changes since v7:
 - Change the Kconfig approach to use a hidden boolean for a
   specific device
 - Change some #ifdefs to reference the new boolean

Changes since v6:
 - Rely on default Kconfig value for a bool
 - comment/doc fixes
 - use const where appropriate
 - fix inline declaration

Changes since v5:
 - Added parameters names in declarations/definitions
 - Reformatted an inline definition

Changes since v4:
 - Guard vendor-specific debugfs files in the Makefile
 - Call top-level routine from iommu_init()
 - Add function for instantiating a driver-specific directory
 - Change AMD driver code to use this new format

Changes since v3:
 - Remove superfluous calls to debugfs_initialized()
 - Emit a warning exactly one time
 - Change the Kconfig name to IOMMU_DEBUGFS
 - Change the way debugfs modules are made

Changes since v2:
 - Move a declaration to outside an ifdef
 - Remove a spurious blank line

Changes since v1:
 - Remove debug cruft
 - Remove cruft produced by design change
 - Change the lock to a mutex
 - Coding style fixes
 - Add a comment to document the framework

---

Gary R Hook (2):
  iommu - Enable debugfs exposure of IOMMU driver internals
  iommu/amd: Add basic debugfs infrastructure for AMD IOMMU


 drivers/iommu/Kconfig |   22 
 drivers/iommu/Makefile|2 +
 drivers/iommu/amd_iommu_debugfs.c |   33 +++
 drivers/iommu/amd_iommu_init.c|6 ++-
 drivers/iommu/amd_iommu_proto.h   |6 +++
 drivers/iommu/amd_iommu_types.h   |5 +++
 drivers/iommu/iommu-debugfs.c |   66 +
 drivers/iommu/iommu.c |2 +
 include/linux/iommu.h |7 
 9 files changed, 147 insertions(+), 2 deletions(-)
 create mode 100644 drivers/iommu/amd_iommu_debugfs.c
 create mode 100644 drivers/iommu/iommu-debugfs.c

--
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-06-12 Thread Gary R Hook

On 06/04/2018 08:23 PM, Randy Dunlap wrote:

On 05/29/2018 11:39 AM, Greg KH wrote:

On Tue, May 29, 2018 at 01:23:23PM -0500, Gary R Hook wrote:

Implement a skeleton framework for debugfs support in the
AMD IOMMU. Add a hidden boolean to Kconfig that is defined
for the AMD IOMMU when general IOMMY DebugFS support is
enabled.

Signed-off-by: Gary R Hook 
---
  drivers/iommu/Kconfig |4 
  drivers/iommu/Makefile|1 +
  drivers/iommu/amd_iommu_debugfs.c |   39 +
  drivers/iommu/amd_iommu_init.c|6 --
  drivers/iommu/amd_iommu_proto.h   |6 ++
  drivers/iommu/amd_iommu_types.h   |5 +
  6 files changed, 59 insertions(+), 2 deletions(-)
  create mode 100644 drivers/iommu/amd_iommu_debugfs.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f9af25ac409f..ec223f6f4ad4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -137,6 +137,10 @@ config AMD_IOMMU
  your BIOS for an option to enable it or if you have an IVRS ACPI
  table.
  
+config AMD_IOMMU_DEBUGFS

+   def_bool y


Why default y?  Can you not boot a box without this?  If not, it should
not be Y.


+   depends on AMD_IOMMU && IOMMU_DEBUGFS
+
  config AMD_IOMMU_V2
tristate "AMD IOMMU Version 2 driver"
depends on AMD_IOMMU


Gary,

By far, most driver-debugfs additions are optional and include a user Kconfig 
prompt
so that user's can choose whether to enable it or not.

I suggest that the way forward is to fix Greg's debugfs_() api comments
and to add a prompt string to AMD_IOMMU_DEBUGFS.


Roger. I think we have that all worked out. Will send another version soon.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/2] iommu - Enable debugfs exposure of IOMMU driver internals

2018-06-12 Thread Gary R Hook

On 06/05/2018 12:08 PM, Greg KH wrote:

On Tue, Jun 05, 2018 at 12:01:41PM -0500, Gary R Hook wrote:

+/**
+ * iommu_debugfs_new_driver_dir - create a vendor directory under debugfs/iommu
+ * @vendor: name of the vendor-specific subdirectory to create
+ *
+ * This function is called by an IOMMU driver to create the top-level debugfs
+ * directory for that driver.
+ *
+ * Return: upon success, a pointer to the dentry for the new directory.
+ * NULL in case of failure.
+ */
+struct dentry *iommu_debugfs_new_driver_dir(const char *vendor)
+{
+   struct dentry *d_new;
+
+   d_new = debugfs_create_dir(vendor, iommu_debugfs_dir);
+
+   return d_new;
+}
+EXPORT_SYMBOL_GPL(iommu_debugfs_new_driver_dir);


Why are you wrapping a debugfs call?  Why not just export
iommu_debugfs_dir instead?


It was a choice, as I stated in my other post. It is not a requirement.
If you resolutely reject this approach, that's fine. I'll change it, no
worries.


Either is fine, but if it stays, it should stay a single line function
:)

thanks,

greg k-h



Then I shall leave it as a black-box function. Single line, of course.

Thank you.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-06-12 Thread Gary R Hook

On 06/05/2018 12:06 PM, Greg KH wrote:

On Tue, Jun 05, 2018 at 11:58:13AM -0500, Gary R Hook wrote:

On 05/29/2018 01:39 PM, Greg KH wrote:

On Tue, May 29, 2018 at 01:23:23PM -0500, Gary R Hook wrote:

Implement a skeleton framework for debugfs support in the
AMD IOMMU. Add a hidden boolean to Kconfig that is defined
for the AMD IOMMU when general IOMMY DebugFS support is
enabled.

Signed-off-by: Gary R Hook 
---
   drivers/iommu/Kconfig |4 
   drivers/iommu/Makefile|1 +
   drivers/iommu/amd_iommu_debugfs.c |   39 
+
   drivers/iommu/amd_iommu_init.c|6 --
   drivers/iommu/amd_iommu_proto.h   |6 ++
   drivers/iommu/amd_iommu_types.h   |5 +
   6 files changed, 59 insertions(+), 2 deletions(-)
   create mode 100644 drivers/iommu/amd_iommu_debugfs.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f9af25ac409f..ec223f6f4ad4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -137,6 +137,10 @@ config AMD_IOMMU
  your BIOS for an option to enable it or if you have an IVRS ACPI
  table.
+config AMD_IOMMU_DEBUGFS
+   def_bool y


Why default y?  Can you not boot a box without this?  If not, it should
not be Y.


Again, apologies for not seeing this sooner.

Yes, the system can boot without this. The idea of a hidden option was
surfaced by Robin, and after my first approach was shot down, I tried this.

Logic: If the over-arching IOMMU debugfs option is enabled, then
AMD_IOMMU_DEBUGFS gets defined, and AMD IOMMU code gets included.

This issue was discussed a few weeks ago. No single approach appears to
satisfy everyone. I like this because it depends upon one switch: Do you
want DebugFS support enabled in the IOMMU driver, period? Vendor-specific
code can then choose to implement support or not, and a builder doesn't have
to worry about enabling/disabling multiple Kconfig options.

At least, that was my line of reasoning.

I'm not married to any approach, and I don't find clever use of Kconfig
options too terribly challenging. And I'm not defending, I'm just
explaining.


The issue is, no one sets Kconfig options except a very tiny subset of
kernel developers.  Distros allways enable everything, as they have to
do that.

If you are creating something here that is so dangerous that you spam
the kernel log with big warning messages, you should not be making it
easy to enable, let alone be enabled by default :)


Okay, I get that. Totally understand.


Just make it an option, have it rely on the kernel debugging option, and
say "DO NOT ENABLE THIS UNLESS YOU REALLY REALLY REALLY KNOW WHAT YOU
ARE DOING!"


Nah, Randy voted for separate options per device, on top of the IOMMU 
option. So I'll go with that. With loud messages, of course.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RESEND] iommu/msm: Don't call iommu_device_{, un}link from atomic context

2018-06-12 Thread Niklas Cassel
Fixes the following splat during boot:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 128, pid: 77, name: kworker/2:1
4 locks held by kworker/2:1/77:
 #0: (ptrval) ((wq_completion)"events"){+.+.}, at: process_one_work+0x1fc/0x8fc
 #1: (ptrval) (deferred_probe_work){+.+.}, at: process_one_work+0x1fc/0x8fc
 #2: (ptrval) (>mutex){}, at: __device_attach+0x40/0x178
 #3: (ptrval) (msm_iommu_lock){}, at: msm_iommu_add_device+0x28/0xcc
irq event stamp: 348
hardirqs last  enabled at (347): [] kfree+0xe0/0x3c0
hardirqs last disabled at (348): [] _raw_spin_lock_irqsave+0x2c/0x68
softirqs last  enabled at (0): [] copy_process.part.5+0x280/0x1a68
softirqs last disabled at (0): [<>]   (null)
Preemption disabled at:
[<>]   (null)
CPU: 2 PID: 77 Comm: kworker/2:1 Not tainted 
4.17.0-rc5-wt-ath-01075-gaca0516bb4cf #239
Hardware name: Generic DT based system
Workqueue: events deferred_probe_work_func
[] (unwind_backtrace) from [] (show_stack+0x20/0x24)
[] (show_stack) from [] (dump_stack+0xa0/0xcc)
[] (dump_stack) from [] (___might_sleep+0x1f8/0x2d4)
ath10k_sdio mmc2:0001:1: Direct firmware load for 
ath10k/QCA9377/hw1.0/board-2.bin failed with error -2
[] (___might_sleep) from [] (__might_sleep+0x70/0xa8)
[] (__might_sleep) from [] (__mutex_lock+0x50/0xb28)
[] (__mutex_lock) from [] (mutex_lock_nested+0x2c/0x34)
ath10k_sdio mmc2:0001:1: board_file api 1 bmi_id N/A crc32 544289f7
[] (mutex_lock_nested) from [] 
(kernfs_find_and_get_ns+0x30/0x5c)
[] (kernfs_find_and_get_ns) from [] 
(sysfs_add_link_to_group+0x28/0x58)
[] (sysfs_add_link_to_group) from [] 
(iommu_device_link+0x50/0xb4)
[] (iommu_device_link) from [] 
(msm_iommu_add_device+0xa0/0xcc)
[] (msm_iommu_add_device) from [] 
(add_iommu_group+0x3c/0x64)
[] (add_iommu_group) from [] (bus_for_each_dev+0x84/0xc4)
[] (bus_for_each_dev) from [] (bus_set_iommu+0xd0/0x10c)
[] (bus_set_iommu) from [] (msm_iommu_probe+0x5b8/0x66c)
[] (msm_iommu_probe) from [] (platform_drv_probe+0x60/0xbc)
[] (platform_drv_probe) from [] 
(driver_probe_device+0x30c/0x4cc)
[] (driver_probe_device) from [] 
(__device_attach_driver+0xac/0x14c)
[] (__device_attach_driver) from [] 
(bus_for_each_drv+0x68/0xc8)
[] (bus_for_each_drv) from [] (__device_attach+0xe4/0x178)
[] (__device_attach) from [] 
(device_initial_probe+0x1c/0x20)
[] (device_initial_probe) from [] 
(bus_probe_device+0x98/0xa0)
[] (bus_probe_device) from [] 
(deferred_probe_work_func+0x74/0x198)
[] (deferred_probe_work_func) from [] 
(process_one_work+0x2c4/0x8fc)
[] (process_one_work) from [] (worker_thread+0x2c4/0x5cc)
[] (worker_thread) from [] (kthread+0x180/0x188)
[] (kthread) from [] (ret_from_fork+0x14/0x20)

Fixes: 42df43b36163 ("iommu/msm: Make use of iommu_device_register interface")
Signed-off-by: Niklas Cassel 
---
 drivers/iommu/msm_iommu.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 0d3350463a3f..9a95c9b9d0d8 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -395,20 +395,15 @@ static int msm_iommu_add_device(struct device *dev)
struct msm_iommu_dev *iommu;
struct iommu_group *group;
unsigned long flags;
-   int ret = 0;
 
spin_lock_irqsave(_iommu_lock, flags);
-
iommu = find_iommu_for_dev(dev);
+   spin_unlock_irqrestore(_iommu_lock, flags);
+
if (iommu)
iommu_device_link(>iommu, dev);
else
-   ret = -ENODEV;
-
-   spin_unlock_irqrestore(_iommu_lock, flags);
-
-   if (ret)
-   return ret;
+   return -ENODEV;
 
group = iommu_group_get_for_dev(dev);
if (IS_ERR(group))
@@ -425,13 +420,12 @@ static void msm_iommu_remove_device(struct device *dev)
unsigned long flags;
 
spin_lock_irqsave(_iommu_lock, flags);
-
iommu = find_iommu_for_dev(dev);
+   spin_unlock_irqrestore(_iommu_lock, flags);
+
if (iommu)
iommu_device_unlink(>iommu, dev);
 
-   spin_unlock_irqrestore(_iommu_lock, flags);
-
iommu_group_remove_device(dev);
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/5] add non-strict mode support for arm-smmu-v3

2018-06-12 Thread Leizhen (ThunderTown)



On 2018/6/11 19:05, Jean-Philippe Brucker wrote:
> Hi Zhen Lei,
> 
> On 10/06/18 12:07, Zhen Lei wrote:
>> v1 -> v2:
>> Use the lowest bit of the io_pgtable_ops.unmap's iova parameter to pass the 
>> strict mode:
>> 0, IOMMU_STRICT;
>> 1, IOMMU_NON_STRICT;
>> Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
>> other IOMMUs which still use strict mode. In other words, this patch series
>> will not impact other IOMMU drivers. I tried add a new quirk 
>> IO_PGTABLE_QUIRK_NON_STRICT
>> in io_pgtable_cfg.quirks, but it can not pass the strict mode of the domain 
>> from SMMUv3
>> driver to io-pgtable module. 
>>
>> Add a new member domain_non_strict in struct iommu_dma_cookie, this member 
>> will only be
>> initialized when the related domain and IOMMU driver support non-strict mode.
> 
> It's not obvious from the commit messages or comments what the
> non-strict mode involves exactly. Could you add a description, and point
> out the trade-off associated with it?

Sorry, I described it in V1, but remove it in V2.
https://lkml.org/lkml/2018/5/31/131

> 
> In this mode you don't send an invalidate commands when removing a leaf
> entry, but instead send invalidate-all commands at regular interval.
> This improves performance but introduces a vulnerability window, which
> should be pointed out to users.
> 
> IOVA allocation isn't the only problem, I'm concerned about the page
> allocator. If unmap() returns early, the TLB entry is still valid after
> the kernel reallocate the page. The device can then perform a
> use-after-free (instead of getting a translation fault), so a buggy
> device will corrupt memory and an untrusted one will access arbitrary data.

I have constrained VFIO to still use strict mode, so all other devices will only
access memory in the kernel state, these related drivers are unlikely to attack
kernel. The devices as part of the commercial product, the probability of such a
bug is very low, at least the bug will not be reserved for the purpose of the
attack. But the attackers may replace it as illegal devices on the spot.

Take a step back, IOMMU disabled is also supported, non-strict mode is better
than disabled. So maybe we should add a boot option, allowing the admin choose
which mode to be used.


> 
> Or is there a way in mm to ensure that the page isn't reallocated until
> the invalidation succeeds? Could dma-iommu help with this? Having

It's too hard. In some cases the memory is allocated by non dma-iommu API.

> support from the mm would also help consolidate ATS, mark a page stale
> when an ATC invalidation times out. But last time I checked it seemed
> quite difficult to implement, and ATS is inherently insecure so I didn't
> bother.
> 
> At the very least I think it might be worth warning users in dmesg about
> this pitfall (and add_taint?). Tell them that an IOMMU in this mode is
> good for scatter-gather performance but lacks full isolation. The
> "non-strict" name seems somewhat harmless, and people should know what
> they're getting into before enabling this.

Yes, warning or add comments in source code will be better.

> 
> Thanks,
> Jean
> 
> .
> 

-- 
Thanks!
BestRegards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu