Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-15 Thread Peter Zijlstra
On Thu, Feb 15, 2018 at 10:44:44AM -0700, Jerry Hoemann wrote:
> Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
> related to David Woodhouse patch set:

That's the work that made us find this code, but no, even without that,
code like that is entirely dodgy. NMI code needs to be very careful and
firmware just isn't something I trust to get things right. Worse, its
not something we can fix.

And using spnilock from NMI context is just wrong, if anything it needs
be raw_spnilock but even then, yuck.


Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-15 Thread Peter Zijlstra
On Thu, Feb 15, 2018 at 10:44:44AM -0700, Jerry Hoemann wrote:
> Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
> related to David Woodhouse patch set:

That's the work that made us find this code, but no, even without that,
code like that is entirely dodgy. NMI code needs to be very careful and
firmware just isn't something I trust to get things right. Worse, its
not something we can fix.

And using spnilock from NMI context is just wrong, if anything it needs
be raw_spnilock but even then, yuck.


Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-15 Thread Ingo Molnar

* Jerry Hoemann  wrote:

> On Thu, Feb 15, 2018 at 12:17:04AM +0100, Ingo Molnar wrote:
> > 
> > * Jerry Hoemann  wrote:
> > 
> > > 
> > > Ingo,
> > > 
> > > I have a patch set under review that brings hpwdt into compliance
> > > with the watchdog core.
> > > 
> > > One of the changes removes the callback into firmware in hpwdt_pretimeout
> > > and its associated spinlock.
> > > 
> > > https://lkml.org/lkml/2018/2/12/30
> > 
> >  drivers/watchdog/hpwdt.c | 490 
> > +--
> >  1 file changed, 8 insertions(+), 482 deletions(-)
> > 
> > Very nice, and this should solve all the NMI handling complications!
> > 
> > > I will add you to the CC list of the next version of the set.
> > 
> > Thanks!
> > 
> > Ingo
> 
> 
> Ingo,
> 
> Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
> related to David Woodhouse patch set:
> 
>   https://lkml.org/lkml/2018/2/14/305  ?
> 
> Which I think are to mitigate performance issues resulting from the
> changes to address the specter/meltdown?

So the motivation was that we are trying to wrap BIOS/EFI calls into 
Spectre-disabling sections - and while doing that we realized that hpwdt calls 
the 
firmware from an NMI callback, which would have complicated the Spectre work..

But with that call removed from NMI context it's all perfect!

> Any there other changes to hpwdt needed related to this work?

No other changes needed I think!

Thanks,

Ingo


Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-15 Thread Ingo Molnar

* Jerry Hoemann  wrote:

> On Thu, Feb 15, 2018 at 12:17:04AM +0100, Ingo Molnar wrote:
> > 
> > * Jerry Hoemann  wrote:
> > 
> > > 
> > > Ingo,
> > > 
> > > I have a patch set under review that brings hpwdt into compliance
> > > with the watchdog core.
> > > 
> > > One of the changes removes the callback into firmware in hpwdt_pretimeout
> > > and its associated spinlock.
> > > 
> > > https://lkml.org/lkml/2018/2/12/30
> > 
> >  drivers/watchdog/hpwdt.c | 490 
> > +--
> >  1 file changed, 8 insertions(+), 482 deletions(-)
> > 
> > Very nice, and this should solve all the NMI handling complications!
> > 
> > > I will add you to the CC list of the next version of the set.
> > 
> > Thanks!
> > 
> > Ingo
> 
> 
> Ingo,
> 
> Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
> related to David Woodhouse patch set:
> 
>   https://lkml.org/lkml/2018/2/14/305  ?
> 
> Which I think are to mitigate performance issues resulting from the
> changes to address the specter/meltdown?

So the motivation was that we are trying to wrap BIOS/EFI calls into 
Spectre-disabling sections - and while doing that we realized that hpwdt calls 
the 
firmware from an NMI callback, which would have complicated the Spectre work..

But with that call removed from NMI context it's all perfect!

> Any there other changes to hpwdt needed related to this work?

No other changes needed I think!

Thanks,

Ingo


Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-15 Thread Jerry Hoemann
On Thu, Feb 15, 2018 at 12:17:04AM +0100, Ingo Molnar wrote:
> 
> * Jerry Hoemann  wrote:
> 
> > 
> > Ingo,
> > 
> > I have a patch set under review that brings hpwdt into compliance
> > with the watchdog core.
> > 
> > One of the changes removes the callback into firmware in hpwdt_pretimeout
> > and its associated spinlock.
> > 
> > https://lkml.org/lkml/2018/2/12/30
> 
>  drivers/watchdog/hpwdt.c | 490 
> +--
>  1 file changed, 8 insertions(+), 482 deletions(-)
> 
> Very nice, and this should solve all the NMI handling complications!
> 
> > I will add you to the CC list of the next version of the set.
> 
> Thanks!
> 
>   Ingo


Ingo,

Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
related to David Woodhouse patch set:

https://lkml.org/lkml/2018/2/14/305  ?

Which I think are to mitigate performance issues resulting from the
changes to address the specter/meltdown?

Any there other changes to hpwdt needed related to this work?

Thanks

Jerry

-- 

-
Jerry Hoemann  Software Engineer   Hewlett Packard Enterprise
-


Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-15 Thread Jerry Hoemann
On Thu, Feb 15, 2018 at 12:17:04AM +0100, Ingo Molnar wrote:
> 
> * Jerry Hoemann  wrote:
> 
> > 
> > Ingo,
> > 
> > I have a patch set under review that brings hpwdt into compliance
> > with the watchdog core.
> > 
> > One of the changes removes the callback into firmware in hpwdt_pretimeout
> > and its associated spinlock.
> > 
> > https://lkml.org/lkml/2018/2/12/30
> 
>  drivers/watchdog/hpwdt.c | 490 
> +--
>  1 file changed, 8 insertions(+), 482 deletions(-)
> 
> Very nice, and this should solve all the NMI handling complications!
> 
> > I will add you to the CC list of the next version of the set.
> 
> Thanks!
> 
>   Ingo


Ingo,

Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
related to David Woodhouse patch set:

https://lkml.org/lkml/2018/2/14/305  ?

Which I think are to mitigate performance issues resulting from the
changes to address the specter/meltdown?

Any there other changes to hpwdt needed related to this work?

Thanks

Jerry

-- 

-
Jerry Hoemann  Software Engineer   Hewlett Packard Enterprise
-


Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-14 Thread Ingo Molnar

* Jerry Hoemann  wrote:

> 
> Ingo,
> 
> I have a patch set under review that brings hpwdt into compliance
> with the watchdog core.
> 
> One of the changes removes the callback into firmware in hpwdt_pretimeout
> and its associated spinlock.
> 
> https://lkml.org/lkml/2018/2/12/30

 drivers/watchdog/hpwdt.c | 490 +--
 1 file changed, 8 insertions(+), 482 deletions(-)

Very nice, and this should solve all the NMI handling complications!

> I will add you to the CC list of the next version of the set.

Thanks!

Ingo


Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-14 Thread Ingo Molnar

* Jerry Hoemann  wrote:

> 
> Ingo,
> 
> I have a patch set under review that brings hpwdt into compliance
> with the watchdog core.
> 
> One of the changes removes the callback into firmware in hpwdt_pretimeout
> and its associated spinlock.
> 
> https://lkml.org/lkml/2018/2/12/30

 drivers/watchdog/hpwdt.c | 490 +--
 1 file changed, 8 insertions(+), 482 deletions(-)

Very nice, and this should solve all the NMI handling complications!

> I will add you to the CC list of the next version of the set.

Thanks!

Ingo


Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-14 Thread Jerry Hoemann

Ingo,

I have a patch set under review that brings hpwdt into compliance
with the watchdog core.

One of the changes removes the callback into firmware in hpwdt_pretimeout
and its associated spinlock.

https://lkml.org/lkml/2018/2/12/30

I will add you to the CC list of the next version of the set.

Jerry


On Wed, Feb 14, 2018 at 10:44:05AM +0100, Borislav Petkov wrote:
> + Jerry
> 
> On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> > 
> > * Tim Chen  wrote:
> > 
> > > Dave Hansen and I had some discussions on how to handle the nested NMI 
> > > and 
> > > firmware calls.  We thought of using a per cpu counter to record the 
> > > nesting 
> > > call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 
> > > transition.  
> > > Will this change be sufficient?
> > 
> > Yeah, so I think the first question should be: does the firmware call from 
> > NMI 
> > context make sense to begin with?
> > 
> > Because in this particular case it does not appear to be so: the reason for 
> > the 
> > BIOS/firmware call appears to be to determine how we nmi_panic() after 
> > receiving 
> > an NMI that no other NMI handler handled: with a passive-aggressive "I 
> > don't know" 
> > panic message or with a slightly more informative panic message.
> > 
> > That's not a real value-add to users - so we can avoid all these 
> > complications by 
> > applying the patch below:
> > 
> >  drivers/watchdog/hpwdt.c | 30 --
> >  1 file changed, 4 insertions(+), 26 deletions(-)
> > 
> > As a bonus the spinlock use can be removed as well.
> > 
> > Thanks,
> > 
> > Ingo
> > 
> > >
> > From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
> > From: Ingo Molnar 
> > Date: Wed, 14 Feb 2018 10:24:41 +0100
> > Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls 
> > from
> >  NMI context
> > 
> > Taking a spinlock and calling into the firmware are problematic things to
> > do from NMI callbacks.
> > 
> > It also seems completely pointless in this particular case:
> > 
> >  - cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
> >callback, but there's almost no use for it: we use it only to determine
> >whether to panic with an 'unknown NMI' or a slightly more descriptive
> >message.
> > 
> >  - cmn_regs and rom_lock is not used anywhere else, other than early 
> > detection
> >of the firmware capability.
> > 
> > So remove rom_lock, make the cmn_regs call from the detection routine only,
> > and thus make the NMI callback a lot more robust.
> > 
> > Signed-off-by: Ingo Molnar 
> > ---
> >  drivers/watchdog/hpwdt.c | 30 --
> >  1 file changed, 4 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> > index f1f00dfc0e68..2544fe482fe3 100644
> > --- a/drivers/watchdog/hpwdt.c
> > +++ b/drivers/watchdog/hpwdt.c
> > @@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
> >  static unsigned int allow_kdump = 1;
> >  static unsigned int is_icru;
> >  static unsigned int is_uefi;
> > -static DEFINE_SPINLOCK(rom_lock);
> >  static void *cru_rom_addr;
> > -static struct cmn_registers cmn_regs;
> >  
> >  extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
> > unsigned long *pRomEntry);
> > @@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
> > unsigned long physical_bios_base = 0;
> > unsigned long physical_bios_offset = 0;
> > int retval = -ENODEV;
> > +   struct cmn_registers cmn_regs = { };
> >  
> > bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
> >  
> > @@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
> >   */
> >  static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
> >  {
> > -   unsigned long rom_pl;
> > -   static int die_nmi_called;
> > -
> > if (!hpwdt_nmi_decoding)
> > return NMI_DONE;
> >  
> > if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
> > return NMI_DONE;
> >  
> > -   spin_lock_irqsave(_lock, rom_pl);
> > -   if (!die_nmi_called && !is_icru && !is_uefi)
> > -   asminline_call(_regs, cru_rom_addr);
> > -   die_nmi_called = 1;
> > -   spin_unlock_irqrestore(_lock, rom_pl);
> > -
> > if (allow_kdump)
> > hpwdt_stop();
> >  
> > -   if (!is_icru && !is_uefi) {
> > -   if (cmn_regs.u1.ral == 0) {
> > -   nmi_panic(regs, "An NMI occurred, but unable to 
> > determine source.\n");
> > -   return NMI_HANDLED;
> > -   }
> > -   }
> > -   nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> > -   "for the NMI is logged in any one of the following "
> > +   nmi_panic(regs,
> > +   "An NMI occurred. Depending on your system the reason "
> > +   "for the NMI might be logged in 

Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-14 Thread Jerry Hoemann

Ingo,

I have a patch set under review that brings hpwdt into compliance
with the watchdog core.

One of the changes removes the callback into firmware in hpwdt_pretimeout
and its associated spinlock.

https://lkml.org/lkml/2018/2/12/30

I will add you to the CC list of the next version of the set.

Jerry


On Wed, Feb 14, 2018 at 10:44:05AM +0100, Borislav Petkov wrote:
> + Jerry
> 
> On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> > 
> > * Tim Chen  wrote:
> > 
> > > Dave Hansen and I had some discussions on how to handle the nested NMI 
> > > and 
> > > firmware calls.  We thought of using a per cpu counter to record the 
> > > nesting 
> > > call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 
> > > transition.  
> > > Will this change be sufficient?
> > 
> > Yeah, so I think the first question should be: does the firmware call from 
> > NMI 
> > context make sense to begin with?
> > 
> > Because in this particular case it does not appear to be so: the reason for 
> > the 
> > BIOS/firmware call appears to be to determine how we nmi_panic() after 
> > receiving 
> > an NMI that no other NMI handler handled: with a passive-aggressive "I 
> > don't know" 
> > panic message or with a slightly more informative panic message.
> > 
> > That's not a real value-add to users - so we can avoid all these 
> > complications by 
> > applying the patch below:
> > 
> >  drivers/watchdog/hpwdt.c | 30 --
> >  1 file changed, 4 insertions(+), 26 deletions(-)
> > 
> > As a bonus the spinlock use can be removed as well.
> > 
> > Thanks,
> > 
> > Ingo
> > 
> > >
> > From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
> > From: Ingo Molnar 
> > Date: Wed, 14 Feb 2018 10:24:41 +0100
> > Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls 
> > from
> >  NMI context
> > 
> > Taking a spinlock and calling into the firmware are problematic things to
> > do from NMI callbacks.
> > 
> > It also seems completely pointless in this particular case:
> > 
> >  - cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
> >callback, but there's almost no use for it: we use it only to determine
> >whether to panic with an 'unknown NMI' or a slightly more descriptive
> >message.
> > 
> >  - cmn_regs and rom_lock is not used anywhere else, other than early 
> > detection
> >of the firmware capability.
> > 
> > So remove rom_lock, make the cmn_regs call from the detection routine only,
> > and thus make the NMI callback a lot more robust.
> > 
> > Signed-off-by: Ingo Molnar 
> > ---
> >  drivers/watchdog/hpwdt.c | 30 --
> >  1 file changed, 4 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> > index f1f00dfc0e68..2544fe482fe3 100644
> > --- a/drivers/watchdog/hpwdt.c
> > +++ b/drivers/watchdog/hpwdt.c
> > @@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
> >  static unsigned int allow_kdump = 1;
> >  static unsigned int is_icru;
> >  static unsigned int is_uefi;
> > -static DEFINE_SPINLOCK(rom_lock);
> >  static void *cru_rom_addr;
> > -static struct cmn_registers cmn_regs;
> >  
> >  extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
> > unsigned long *pRomEntry);
> > @@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
> > unsigned long physical_bios_base = 0;
> > unsigned long physical_bios_offset = 0;
> > int retval = -ENODEV;
> > +   struct cmn_registers cmn_regs = { };
> >  
> > bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
> >  
> > @@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
> >   */
> >  static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
> >  {
> > -   unsigned long rom_pl;
> > -   static int die_nmi_called;
> > -
> > if (!hpwdt_nmi_decoding)
> > return NMI_DONE;
> >  
> > if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
> > return NMI_DONE;
> >  
> > -   spin_lock_irqsave(_lock, rom_pl);
> > -   if (!die_nmi_called && !is_icru && !is_uefi)
> > -   asminline_call(_regs, cru_rom_addr);
> > -   die_nmi_called = 1;
> > -   spin_unlock_irqrestore(_lock, rom_pl);
> > -
> > if (allow_kdump)
> > hpwdt_stop();
> >  
> > -   if (!is_icru && !is_uefi) {
> > -   if (cmn_regs.u1.ral == 0) {
> > -   nmi_panic(regs, "An NMI occurred, but unable to 
> > determine source.\n");
> > -   return NMI_HANDLED;
> > -   }
> > -   }
> > -   nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> > -   "for the NMI is logged in any one of the following "
> > +   nmi_panic(regs,
> > +   "An NMI occurred. Depending on your system the reason "
> > +   "for the NMI might be logged in any one of the following "
> > "resources:\n"
> >

Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-14 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> > Because in this particular case it does not appear to be so: the reason for 
> > the 
> > BIOS/firmware call appears to be to determine how we nmi_panic() after 
> > receiving 
> > an NMI that no other NMI handler handled: with a passive-aggressive "I 
> > don't know" 
> > panic message or with a slightly more informative panic message.
> 
> However much I like just ripping all that out, I think the ROM call
> actually does that logging, or that is how I read things.
> 
> If you look at the original Changelog for that driver:
> 
> Hp is providing a Hardware WatchDog Timer driver that will only work with 
> the
> specific HW Timer located in the HP ProLiant iLO 2 ASIC. The iLO 2 HW 
> Timer
> will generate a Non-maskable Interrupt (NMI) 9 seconds before physically
> resetting the server, by removing power, so that the event can be logged 
> to
> the HP Integrated Management Log (IML), a Non-Volatile Random Access 
> Memory
> (NVRAM). The logging of the event is performed using the HP ProLiant ROM 
> via
> an Industry Standard access known as a BIOS Service Directory Entry.

Ok, that appears to be the case, too bad.

But the good news: if this callback is executed only once per system lifetime 
then 
we don't actually have to perform *any* modification on this driver, right? The 
reason is that this callback will panic unconditionally after performing the 
BIOS 
call. The control flow to the panic is unconditional:

spin_lock_irqsave(_lock, rom_pl);
if (!die_nmi_called && !is_icru && !is_uefi)
asminline_call(_regs, cru_rom_addr);

...

if (!is_icru && !is_uefi) { 
if (cmn_regs.u1.ral == 0) {
nmi_panic(regs, "An NMI occurred, but unable to 
determine source.\n");
...

nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
"for the NMI is logged in any one of the following "
"resources:\n"
"1. Integrated Management Log (IML)\n"
"2. OA Syslog\n"
"3. OA Forward Progress Log\n"
"4. iLO Event Log");


This callback does not get executed when we get perf NMIs, correct?

Ingo


Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-14 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> > Because in this particular case it does not appear to be so: the reason for 
> > the 
> > BIOS/firmware call appears to be to determine how we nmi_panic() after 
> > receiving 
> > an NMI that no other NMI handler handled: with a passive-aggressive "I 
> > don't know" 
> > panic message or with a slightly more informative panic message.
> 
> However much I like just ripping all that out, I think the ROM call
> actually does that logging, or that is how I read things.
> 
> If you look at the original Changelog for that driver:
> 
> Hp is providing a Hardware WatchDog Timer driver that will only work with 
> the
> specific HW Timer located in the HP ProLiant iLO 2 ASIC. The iLO 2 HW 
> Timer
> will generate a Non-maskable Interrupt (NMI) 9 seconds before physically
> resetting the server, by removing power, so that the event can be logged 
> to
> the HP Integrated Management Log (IML), a Non-Volatile Random Access 
> Memory
> (NVRAM). The logging of the event is performed using the HP ProLiant ROM 
> via
> an Industry Standard access known as a BIOS Service Directory Entry.

Ok, that appears to be the case, too bad.

But the good news: if this callback is executed only once per system lifetime 
then 
we don't actually have to perform *any* modification on this driver, right? The 
reason is that this callback will panic unconditionally after performing the 
BIOS 
call. The control flow to the panic is unconditional:

spin_lock_irqsave(_lock, rom_pl);
if (!die_nmi_called && !is_icru && !is_uefi)
asminline_call(_regs, cru_rom_addr);

...

if (!is_icru && !is_uefi) { 
if (cmn_regs.u1.ral == 0) {
nmi_panic(regs, "An NMI occurred, but unable to 
determine source.\n");
...

nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
"for the NMI is logged in any one of the following "
"resources:\n"
"1. Integrated Management Log (IML)\n"
"2. OA Syslog\n"
"3. OA Forward Progress Log\n"
"4. iLO Event Log");


This callback does not get executed when we get perf NMIs, correct?

Ingo


Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-14 Thread Borislav Petkov
+ Jerry

On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> 
> * Tim Chen  wrote:
> 
> > Dave Hansen and I had some discussions on how to handle the nested NMI and 
> > firmware calls.  We thought of using a per cpu counter to record the 
> > nesting 
> > call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 
> > transition.  
> > Will this change be sufficient?
> 
> Yeah, so I think the first question should be: does the firmware call from 
> NMI 
> context make sense to begin with?
> 
> Because in this particular case it does not appear to be so: the reason for 
> the 
> BIOS/firmware call appears to be to determine how we nmi_panic() after 
> receiving 
> an NMI that no other NMI handler handled: with a passive-aggressive "I don't 
> know" 
> panic message or with a slightly more informative panic message.
> 
> That's not a real value-add to users - so we can avoid all these 
> complications by 
> applying the patch below:
> 
>  drivers/watchdog/hpwdt.c | 30 --
>  1 file changed, 4 insertions(+), 26 deletions(-)
> 
> As a bonus the spinlock use can be removed as well.
> 
> Thanks,
> 
>   Ingo
> 
> >
> From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
> From: Ingo Molnar 
> Date: Wed, 14 Feb 2018 10:24:41 +0100
> Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from
>  NMI context
> 
> Taking a spinlock and calling into the firmware are problematic things to
> do from NMI callbacks.
> 
> It also seems completely pointless in this particular case:
> 
>  - cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
>callback, but there's almost no use for it: we use it only to determine
>whether to panic with an 'unknown NMI' or a slightly more descriptive
>message.
> 
>  - cmn_regs and rom_lock is not used anywhere else, other than early detection
>of the firmware capability.
> 
> So remove rom_lock, make the cmn_regs call from the detection routine only,
> and thus make the NMI callback a lot more robust.
> 
> Signed-off-by: Ingo Molnar 
> ---
>  drivers/watchdog/hpwdt.c | 30 --
>  1 file changed, 4 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> index f1f00dfc0e68..2544fe482fe3 100644
> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
>  static unsigned int allow_kdump = 1;
>  static unsigned int is_icru;
>  static unsigned int is_uefi;
> -static DEFINE_SPINLOCK(rom_lock);
>  static void *cru_rom_addr;
> -static struct cmn_registers cmn_regs;
>  
>  extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
>   unsigned long *pRomEntry);
> @@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
>   unsigned long physical_bios_base = 0;
>   unsigned long physical_bios_offset = 0;
>   int retval = -ENODEV;
> + struct cmn_registers cmn_regs = { };
>  
>   bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
>  
> @@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
>   */
>  static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
>  {
> - unsigned long rom_pl;
> - static int die_nmi_called;
> -
>   if (!hpwdt_nmi_decoding)
>   return NMI_DONE;
>  
>   if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
>   return NMI_DONE;
>  
> - spin_lock_irqsave(_lock, rom_pl);
> - if (!die_nmi_called && !is_icru && !is_uefi)
> - asminline_call(_regs, cru_rom_addr);
> - die_nmi_called = 1;
> - spin_unlock_irqrestore(_lock, rom_pl);
> -
>   if (allow_kdump)
>   hpwdt_stop();
>  
> - if (!is_icru && !is_uefi) {
> - if (cmn_regs.u1.ral == 0) {
> - nmi_panic(regs, "An NMI occurred, but unable to 
> determine source.\n");
> - return NMI_HANDLED;
> - }
> - }
> - nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> - "for the NMI is logged in any one of the following "
> + nmi_panic(regs,
> + "An NMI occurred. Depending on your system the reason "
> + "for the NMI might be logged in any one of the following "
>   "resources:\n"
>   "1. Integrated Management Log (IML)\n"
>   "2. OA Syslog\n"
> @@ -744,13 +729,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
>   HPWDT_ARCH);
>   return retval;
>   }
> -
> - /*
> - * We know this is the only CRU call we need to make so lets 
> keep as
> - * few instructions as possible once the NMI comes in.
> - */
> - cmn_regs.u1.rah = 0x0D;
> -  

Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-14 Thread Borislav Petkov
+ Jerry

On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> 
> * Tim Chen  wrote:
> 
> > Dave Hansen and I had some discussions on how to handle the nested NMI and 
> > firmware calls.  We thought of using a per cpu counter to record the 
> > nesting 
> > call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 
> > transition.  
> > Will this change be sufficient?
> 
> Yeah, so I think the first question should be: does the firmware call from 
> NMI 
> context make sense to begin with?
> 
> Because in this particular case it does not appear to be so: the reason for 
> the 
> BIOS/firmware call appears to be to determine how we nmi_panic() after 
> receiving 
> an NMI that no other NMI handler handled: with a passive-aggressive "I don't 
> know" 
> panic message or with a slightly more informative panic message.
> 
> That's not a real value-add to users - so we can avoid all these 
> complications by 
> applying the patch below:
> 
>  drivers/watchdog/hpwdt.c | 30 --
>  1 file changed, 4 insertions(+), 26 deletions(-)
> 
> As a bonus the spinlock use can be removed as well.
> 
> Thanks,
> 
>   Ingo
> 
> >
> From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
> From: Ingo Molnar 
> Date: Wed, 14 Feb 2018 10:24:41 +0100
> Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from
>  NMI context
> 
> Taking a spinlock and calling into the firmware are problematic things to
> do from NMI callbacks.
> 
> It also seems completely pointless in this particular case:
> 
>  - cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
>callback, but there's almost no use for it: we use it only to determine
>whether to panic with an 'unknown NMI' or a slightly more descriptive
>message.
> 
>  - cmn_regs and rom_lock is not used anywhere else, other than early detection
>of the firmware capability.
> 
> So remove rom_lock, make the cmn_regs call from the detection routine only,
> and thus make the NMI callback a lot more robust.
> 
> Signed-off-by: Ingo Molnar 
> ---
>  drivers/watchdog/hpwdt.c | 30 --
>  1 file changed, 4 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> index f1f00dfc0e68..2544fe482fe3 100644
> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
>  static unsigned int allow_kdump = 1;
>  static unsigned int is_icru;
>  static unsigned int is_uefi;
> -static DEFINE_SPINLOCK(rom_lock);
>  static void *cru_rom_addr;
> -static struct cmn_registers cmn_regs;
>  
>  extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
>   unsigned long *pRomEntry);
> @@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
>   unsigned long physical_bios_base = 0;
>   unsigned long physical_bios_offset = 0;
>   int retval = -ENODEV;
> + struct cmn_registers cmn_regs = { };
>  
>   bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
>  
> @@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
>   */
>  static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
>  {
> - unsigned long rom_pl;
> - static int die_nmi_called;
> -
>   if (!hpwdt_nmi_decoding)
>   return NMI_DONE;
>  
>   if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
>   return NMI_DONE;
>  
> - spin_lock_irqsave(_lock, rom_pl);
> - if (!die_nmi_called && !is_icru && !is_uefi)
> - asminline_call(_regs, cru_rom_addr);
> - die_nmi_called = 1;
> - spin_unlock_irqrestore(_lock, rom_pl);
> -
>   if (allow_kdump)
>   hpwdt_stop();
>  
> - if (!is_icru && !is_uefi) {
> - if (cmn_regs.u1.ral == 0) {
> - nmi_panic(regs, "An NMI occurred, but unable to 
> determine source.\n");
> - return NMI_HANDLED;
> - }
> - }
> - nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> - "for the NMI is logged in any one of the following "
> + nmi_panic(regs,
> + "An NMI occurred. Depending on your system the reason "
> + "for the NMI might be logged in any one of the following "
>   "resources:\n"
>   "1. Integrated Management Log (IML)\n"
>   "2. OA Syslog\n"
> @@ -744,13 +729,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
>   HPWDT_ARCH);
>   return retval;
>   }
> -
> - /*
> - * We know this is the only CRU call we need to make so lets 
> keep as
> - * few instructions as possible once the NMI comes in.
> - */
> - cmn_regs.u1.rah = 0x0D;
> - cmn_regs.u1.ral = 0x02;
>   }
>  
>   /*

-- 

Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-14 Thread Peter Zijlstra
On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> Because in this particular case it does not appear to be so: the reason for 
> the 
> BIOS/firmware call appears to be to determine how we nmi_panic() after 
> receiving 
> an NMI that no other NMI handler handled: with a passive-aggressive "I don't 
> know" 
> panic message or with a slightly more informative panic message.

However much I like just ripping all that out, I think the ROM call
actually does that logging, or that is how I read things.

If you look at the original Changelog for that driver:

Hp is providing a Hardware WatchDog Timer driver that will only work with 
the
specific HW Timer located in the HP ProLiant iLO 2 ASIC. The iLO 2 HW Timer
will generate a Non-maskable Interrupt (NMI) 9 seconds before physically
resetting the server, by removing power, so that the event can be logged to
the HP Integrated Management Log (IML), a Non-Volatile Random Access Memory
(NVRAM). The logging of the event is performed using the HP ProLiant ROM via
an Industry Standard access known as a BIOS Service Directory Entry.




Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context

2018-02-14 Thread Peter Zijlstra
On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> Because in this particular case it does not appear to be so: the reason for 
> the 
> BIOS/firmware call appears to be to determine how we nmi_panic() after 
> receiving 
> an NMI that no other NMI handler handled: with a passive-aggressive "I don't 
> know" 
> panic message or with a slightly more informative panic message.

However much I like just ripping all that out, I think the ROM call
actually does that logging, or that is how I read things.

If you look at the original Changelog for that driver:

Hp is providing a Hardware WatchDog Timer driver that will only work with 
the
specific HW Timer located in the HP ProLiant iLO 2 ASIC. The iLO 2 HW Timer
will generate a Non-maskable Interrupt (NMI) 9 seconds before physically
resetting the server, by removing power, so that the event can be logged to
the HP Integrated Management Log (IML), a Non-Volatile Random Access Memory
(NVRAM). The logging of the event is performed using the HP ProLiant ROM via
an Industry Standard access known as a BIOS Service Directory Entry.