>>> On 19.02.18 at 16:20, <igor.druzhi...@citrix.com> wrote:
> On 19/02/18 15:18, Jan Beulich wrote:
>>>>> On 19.02.18 at 15:23, <igor.druzhi...@citrix.com> wrote:
>>> We're noticing a reproducible system boot hang on certain
>>> post-Skylake platforms where the BIOS is configured in
>>> legacy boot mode with x2APIC disabled. The system stalls
>>> immediately after writing the first SMP initialization
>>> sequence into APIC ICR.
>>> The cause of the problem is watchdog NMI handler execution -
>>> somewhere near the end of NMI handling (after it's already
>>> rescheduled the next NMI) it tries to access IO port 0x61
>>> to get the actual NMI reason on CPU0. Unfortunately, this
>>> port is emulated by BIOS using SMIs and this emulation for
>>> some reason takes more time than we expect during INIT-SIPI-SIPI
>>> sequence. As the result, the system is constantly moving between
>>> NMI and SMI handler and not making any progress.
>>> To avoid this, initialize the watchdog after SMP bootstrap on
>>> CPU0 and, additionally, protect the NMI handler by moving
>>> IO port access before NMI re-scheduling. The latter should help
>>> in case of post boot CPU onlining. Although we're running
>>> watchdog at much lower frequency it's neveretheless possible
>>> we may trigger the issue anyway.
>> I'm afraid I can't connect "the latter" to anything earlier in the
> It's the previous sentence - there are 2 things that we do here - the
> latter is "protect the NMI handler by moving IO port access before NMI
Oh, I thought you mean to refer to the lower frequency. How
about "The latter should also help ..."?
Xen-devel mailing list