On 08/02/18 14:37, Alexey G wrote:
> On Thu, 8 Feb 2018 12:40:41 +0000
> Andrew Cooper <andrew.coop...@citrix.com> wrote:
>> - Perf/Oprofile. This is currently mutually exclusive with Xen using
>> the watchdog, but needn't be and hopefully won't be in the future.
>>> Most of the time we deal with watchdog NMIs, while all others should
>>> be somewhat rare. The thing is, we actually need to read I/O port
>>> 61h on system NMIs only.
>>> If the main problem lies in a flow of SMIs due to reading port 61h on
>>> every NMI watchdog tick -- why not to avoid reading it?
>>> There are at least 2 ways to check if the NMI was due to a watchdog
>>> - LAPIC (SDM states that "When a performance monitoring counters
>>> interrupt is generated, the mask bit for its associated LVT entry is
>>> - perf MSR overflow bit
>>> So, if we detect it was a NMI due to a watchdog using these
>>> methods (early in the NMI handler), we can avoid touching the port
>>> 61h and thus triggering SMI I/O trap on it.
>> The problem is having multiple NMIs arriving. Like all other edge
>> triggered interrupts, extra arrivals get dropped. By skipping the 0x61
>> read if we believe it was a watchdog NMI, we've opened a race condition
>> where we will completely miss the system NMI.
> There shouldn't be any problem I think. NMIs don't need to be cleared
> with EOI and it's a common practice to handle NMIs one-by-one (as a NMI
> handler is not reentrant in a typical scenario).
> Execution of SMI doesn't cause a pending (blocked) NMI to get dropped,
> similar mechanisms might be employed for a single NMI which arrived in
> blocked-by-NMI state. Otherwise the whole thing will break -- merely
> handling arbitrary NMI will be enough to miss any other NMIs. This is a
> too obvious flaw. So normally it should be just a matter which NMI of
> two will be serviced first.
> This assumption can be verified empirically by requesting the chipset
> to send an external NMI while serving a watchdog NMI and checking if it
> arrive later on.
NMI handling works just like other interrupts, except that its
equivalent of the ISR/IRR state is hidden.
One new NMI will become pending while an NMI is in progress (because
there is an IRR bit to be set), but any further will be dropped.
You can demonstrate this easily by having CPUs or the chipset send NMIs.
Xen-devel mailing list