Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-13 Thread Jan Beulich
>>> On 08.02.18 at 13:18, wrote: > We switch the NMI frequency to ~2Hz after the calibration, but that is > after having run the BSP at 100Hz for a long period of time, and the APs > at this rate for a short while. Irrespective of the exact fix here, it > is simply not

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Alexey G
On Thu, 8 Feb 2018 15:00:33 + Andrew Cooper wrote: >On 08/02/18 14:37, Alexey G wrote: >> On Thu, 8 Feb 2018 12:40:41 + >> Andrew Cooper wrote: >>> - Perf/Oprofile.  This is currently mutually exclusive with Xen >>> using the

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Andrew Cooper
On 08/02/18 14:37, Alexey G wrote: > On Thu, 8 Feb 2018 12:40:41 + > Andrew Cooper wrote: >> - Perf/Oprofile.  This is currently mutually exclusive with Xen using >> the watchdog, but needn't be and hopefully won't be in the future. >> >>> Most of the time we deal

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Alexey G
On Thu, 8 Feb 2018 12:40:41 + Andrew Cooper wrote: >- Perf/Oprofile.  This is currently mutually exclusive with Xen using >the watchdog, but needn't be and hopefully won't be in the future. > >> >> Most of the time we deal with watchdog NMIs, while all others should

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Andrew Cooper
On 08/02/18 12:32, Alexey G wrote: > On Thu, 8 Feb 2018 10:47:45 + > Igor Druzhinin wrote: >> I've done this measurement before. So what we are seeing exactly is >> that the time we are spending in SMI is spiking (sometimes up to >> 200ms) at the moment we go

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Alexey G
On Thu, 8 Feb 2018 10:47:45 + Igor Druzhinin wrote: >I've done this measurement before. So what we are seeing exactly is >that the time we are spending in SMI is spiking (sometimes up to >200ms) at the moment we go through INIT-SIPI-SIPI sequence. Looks like >this

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Andrew Cooper
On 08/02/18 09:12, Jan Beulich wrote: On 07.02.18 at 18:08, wrote: >> On 07/02/18 15:06, Jan Beulich wrote: >>> Also you completely ignore my argument against the seemingly >>> random division by 10, including the resulting question of what you >>> mean to do once

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Igor Druzhinin
On 08/02/18 06:37, Alexey G wrote: > On Wed, 7 Feb 2018 13:01:08 + > Igor Druzhinin wrote: >> So far the issue confirmed: >> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one >> that it was tested on), Intel S2600XX, etc. >> >> Also see: >>

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Jan Beulich
>>> On 07.02.18 at 18:08, wrote: > On 07/02/18 15:06, Jan Beulich wrote: >> Also you completely ignore my argument against the seemingly >> random division by 10, including the resulting question of what you >> mean to do once 10Hz also turns out too high a frequency. >

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Alexey G
On Wed, 7 Feb 2018 13:01:08 + Igor Druzhinin wrote: >So far the issue confirmed: >Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one >that it was tested on), Intel S2600XX, etc. > >Also see: >https://bugs.xenserver.org/browse/XSO-774 > >Well,

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Andrew Cooper
On 07/02/18 15:06, Jan Beulich wrote: On 07.02.18 at 14:24, wrote: >> On 07/02/18 13:08, Jan Beulich wrote: >> On 07.02.18 at 14:01, wrote: So far the issue confirmed: Dell PowerEdge R740, Huawei systems based on Xeon Gold

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Jan Beulich
>>> On 07.02.18 at 14:24, wrote: > On 07/02/18 13:08, Jan Beulich wrote: > On 07.02.18 at 14:01, wrote: >>> So far the issue confirmed: >>> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one >>> that it was tested on),

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Igor Druzhinin
On 07/02/18 13:08, Jan Beulich wrote: On 07.02.18 at 14:01, wrote: >> So far the issue confirmed: >> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one >> that it was tested on), Intel S2600XX, etc. >> >> Also see: >>

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Andrew Cooper
On 07/02/18 13:08, Jan Beulich wrote: On 07.02.18 at 14:01, wrote: >> So far the issue confirmed: >> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one >> that it was tested on), Intel S2600XX, etc. >> >> Also see: >>

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Jan Beulich
>>> On 07.02.18 at 14:01, wrote: > So far the issue confirmed: > Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one > that it was tested on), Intel S2600XX, etc. > > Also see: > https://bugs.xenserver.org/browse/XSO-774 > > Well, no-watchdog is what

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Igor Druzhinin
On 07/02/18 09:13, Jan Beulich wrote: On 06.02.18 at 22:51, wrote: >> The problem with a quirk/commandline parameter is that the issue is >> reported for a wide variety of systems and, as it looks like, depends on >> the default BIOS setup - means it's hard to

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Jan Beulich
>>> On 06.02.18 at 22:51, wrote: > The problem with a quirk/commandline parameter is that the issue is > reported for a wide variety of systems and, as it looks like, depends on > the default BIOS setup - means it's hard to identify particular > machines. We should

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Alexey G
If the actual SMI source is not related to some place in the NMI handler code but was eg. due to some SMI timer, lowering NMI watchdog frequency might not fix the issue completely, but lower its reproducibility (perhaps to some very rare occurrences). So it's better be

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 16:23, Jan Beulich wrote: On 06.02.18 at 17:14, wrote: >> On 06/02/18 16:07, Jan Beulich wrote: >> On 05.02.18 at 22:18, wrote: --- a/xen/arch/x86/nmi.c +++ b/xen/arch/x86/nmi.c @@ -34,7 +34,8 @@

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Alexey G
On Tue, 6 Feb 2018 17:21:19 + Igor Druzhinin wrote: >On 06/02/18 17:08, Alexey G wrote: >> The major concern here is the possiblity of SMI being triggered _not_ >> by some specific I/O port access. Primarily, if it actually was a >> periodic SMI. >> >> If the

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 17:08, Alexey G wrote: > On Tue, 6 Feb 2018 14:21:12 + > Andrew Cooper wrote: > >> On 06/02/18 03:10, Alexey G wrote: >>> I/O port 61h normally is not emulated by SMI legacy kbd handling code >>> in BIOS, only ports like 60h, 64h, etc. >>> Contrary to

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Alexey G
On Tue, 6 Feb 2018 14:21:12 + Andrew Cooper wrote: >On 06/02/18 03:10, Alexey G wrote: >> I/O port 61h normally is not emulated by SMI legacy kbd handling code >> in BIOS, only ports like 60h, 64h, etc. >> Contrary to USB legacy emulation, it has to intercept port

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 16:23, Jan Beulich wrote: On 06.02.18 at 17:14, wrote: >> On 06/02/18 16:07, Jan Beulich wrote: >> On 05.02.18 at 22:18, wrote: --- a/xen/arch/x86/nmi.c +++ b/xen/arch/x86/nmi.c @@ -34,7 +34,8 @@

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 16:23, Jan Beulich wrote: On 06.02.18 at 17:14, wrote: >> On 06/02/18 16:07, Jan Beulich wrote: >> On 05.02.18 at 22:18, wrote: --- a/xen/arch/x86/nmi.c +++ b/xen/arch/x86/nmi.c @@ -34,7 +34,8 @@

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Jan Beulich
>>> On 06.02.18 at 17:14, wrote: > On 06/02/18 16:07, Jan Beulich wrote: > On 05.02.18 at 22:18, wrote: >>> --- a/xen/arch/x86/nmi.c >>> +++ b/xen/arch/x86/nmi.c >>> @@ -34,7 +34,8 @@ >>> #include >>> >>> unsigned int nmi_watchdog =

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 16:07, Jan Beulich wrote: On 05.02.18 at 22:18, wrote: >> --- a/xen/arch/x86/nmi.c >> +++ b/xen/arch/x86/nmi.c >> @@ -34,7 +34,8 @@ >> #include >> >> unsigned int nmi_watchdog = NMI_NONE; >> -static unsigned int nmi_hz = HZ; >> +/* initial watchdog

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Jan Beulich
>>> On 05.02.18 at 22:18, wrote: > --- a/xen/arch/x86/nmi.c > +++ b/xen/arch/x86/nmi.c > @@ -34,7 +34,8 @@ > #include > > unsigned int nmi_watchdog = NMI_NONE; > -static unsigned int nmi_hz = HZ; > +/* initial watchdog frequency - shouldn't be too high to avoid

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Andrew Cooper
On 06/02/18 03:10, Alexey G wrote: > On Mon, 5 Feb 2018 21:18:42 + > Igor Druzhinin wrote: > >> We're noticing a reproducible system boot hang on certain >> post-Skylake platforms where the BIOS is configured in >> legacy boot mode with x2APIC disabled. The system

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Andrew Cooper
On 05/02/18 21:18, Igor Druzhinin wrote: > We're noticing a reproducible system boot hang on certain > post-Skylake platforms where the BIOS is configured in Its just a plain Skylake Server, from what I can see. > legacy boot mode with x2APIC disabled. The system stalls > immediately after

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-05 Thread Alexey G
On Mon, 5 Feb 2018 21:18:42 + Igor Druzhinin wrote: >We're noticing a reproducible system boot hang on certain >post-Skylake platforms where the BIOS is configured in >legacy boot mode with x2APIC disabled. The system stalls >immediately after writing the first SMP

[Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-05 Thread Igor Druzhinin
We're noticing a reproducible system boot hang on certain post-Skylake platforms where the BIOS is configured in legacy boot mode with x2APIC disabled. The system stalls immediately after writing the first SMP initialization sequence into APIC ICR. The cause of the problem is watchdog NMI handler