Re: [PATCH] ACPI / APEI: Fix NMI notification handling

2016-11-29 Thread Borislav Petkov
On Tue, Nov 29, 2016 at 01:43:59PM -0500, Prarit Bhargava wrote:
> When removing and adding cpu 0 on a system with GHES NMI the following stack
> trace is seen when re-adding the cpu:
> 
> WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
> Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc5+ #59
> Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.01.00.0
> 81c03e78 81337905  
> 81c03eb8 8107d9c1 0545810aac4a 
> 00f0  81cb6440f1d0 0001
> Call Trace:
> [] dump_stack+0x63/0x8e
> [] __warn+0xd1/0xf0
> [] warn_slowpath_null+0x1d/0x20
> [] setup_local_APIC+0x275/0x370
> [] apic_ap_setup+0xe/0x20
> [] start_secondary+0x48/0x180
> [] ? set_init_arg+0x55/0x55
> [] ? early_idt_handler_array+0x120/0x120
> [] ? x86_64_start_reservations+0x2a/0x2c
> [] ? x86_64_start_kernel+0x13d/0x14c
> ---[ end trace 7b6555b6343ef9ee ]---

Please remove all hex numbers from the splat - they're useless in the
commit message.

> During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
> NMI on CPU 0.  The GHES NMI handler, ghes_notify_nmi() runs the
> ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
> (0xf6).  The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is  also
> 0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x40) which confirms
> that something has set the IRQ_WORK_VECTOR line prior to the APIC being
> initialized.
>
> Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> incorrectly modified the behavior such that the handler returns
> NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
> work queue for every NMI.
> 
> This patch modifies the ghes_proc_irq_work() to run as it did prior to
> 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
> properly returning NMI_HANDLED and only calling the work queue if
> NMI_HANDLED has been set.
> 
> Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> Signed-off-by: Prarit Bhargava 
> Cc: Borislav Petkov 
> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: Paul Gortmaker 
> Cc: Tyler Baicar 
> Cc: Punit Agrawal 
> Cc: Don Zickus 
> Cc: linux-a...@vger.kernel.org
> ---
>  drivers/acpi/apei/ghes.c |7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0d099a24f776..39c45efbcb3d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -858,17 +858,18 @@ static int ghes_notify_nmi(unsigned int cmd, struct 
> pt_regs *regs)
>   if (sev >= GHES_SEV_PANIC)
>   __ghes_panic(ghes);
>  
> + ret = NMI_HANDLED;
> +

Make that more explicit:

if (ghes_read_estatus(ghes, 1)) {
ghes_clear_estatus(ghes);
continue;
} else {
ret = NMI_HANDLED;
}


>   if (!(ghes->flags & GHES_TO_CLEAR))
>   continue;
>  
>   __process_error(ghes);
>   ghes_clear_estatus(ghes);
> -
> - ret = NMI_HANDLED;
>   }
>  
>  #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> - irq_work_queue(_proc_irq_work);
> + if (ret == NMI_HANDLED)
> + irq_work_queue(_proc_irq_work);
>  #endif
>   atomic_dec(_in_nmi);
>   return ret;
> -- 

Otherwise looks ok,
thanks.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH] ACPI / APEI: Fix NMI notification handling

2016-11-29 Thread Borislav Petkov
On Tue, Nov 29, 2016 at 01:43:59PM -0500, Prarit Bhargava wrote:
> When removing and adding cpu 0 on a system with GHES NMI the following stack
> trace is seen when re-adding the cpu:
> 
> WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
> Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc5+ #59
> Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.01.00.0
> 81c03e78 81337905  
> 81c03eb8 8107d9c1 0545810aac4a 
> 00f0  81cb6440f1d0 0001
> Call Trace:
> [] dump_stack+0x63/0x8e
> [] __warn+0xd1/0xf0
> [] warn_slowpath_null+0x1d/0x20
> [] setup_local_APIC+0x275/0x370
> [] apic_ap_setup+0xe/0x20
> [] start_secondary+0x48/0x180
> [] ? set_init_arg+0x55/0x55
> [] ? early_idt_handler_array+0x120/0x120
> [] ? x86_64_start_reservations+0x2a/0x2c
> [] ? x86_64_start_kernel+0x13d/0x14c
> ---[ end trace 7b6555b6343ef9ee ]---

Please remove all hex numbers from the splat - they're useless in the
commit message.

> During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
> NMI on CPU 0.  The GHES NMI handler, ghes_notify_nmi() runs the
> ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
> (0xf6).  The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is  also
> 0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x40) which confirms
> that something has set the IRQ_WORK_VECTOR line prior to the APIC being
> initialized.
>
> Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> incorrectly modified the behavior such that the handler returns
> NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
> work queue for every NMI.
> 
> This patch modifies the ghes_proc_irq_work() to run as it did prior to
> 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
> properly returning NMI_HANDLED and only calling the work queue if
> NMI_HANDLED has been set.
> 
> Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> Signed-off-by: Prarit Bhargava 
> Cc: Borislav Petkov 
> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: Paul Gortmaker 
> Cc: Tyler Baicar 
> Cc: Punit Agrawal 
> Cc: Don Zickus 
> Cc: linux-a...@vger.kernel.org
> ---
>  drivers/acpi/apei/ghes.c |7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0d099a24f776..39c45efbcb3d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -858,17 +858,18 @@ static int ghes_notify_nmi(unsigned int cmd, struct 
> pt_regs *regs)
>   if (sev >= GHES_SEV_PANIC)
>   __ghes_panic(ghes);
>  
> + ret = NMI_HANDLED;
> +

Make that more explicit:

if (ghes_read_estatus(ghes, 1)) {
ghes_clear_estatus(ghes);
continue;
} else {
ret = NMI_HANDLED;
}


>   if (!(ghes->flags & GHES_TO_CLEAR))
>   continue;
>  
>   __process_error(ghes);
>   ghes_clear_estatus(ghes);
> -
> - ret = NMI_HANDLED;
>   }
>  
>  #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> - irq_work_queue(_proc_irq_work);
> + if (ret == NMI_HANDLED)
> + irq_work_queue(_proc_irq_work);
>  #endif
>   atomic_dec(_in_nmi);
>   return ret;
> -- 

Otherwise looks ok,
thanks.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--