RE: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

2015-08-21 Thread 河合英宏 / KAWAI,HIDEHIRO
> From: Peter Zijlstra [mailto:pet...@infradead.org]
> User-Agent: StGit/0.16
> 
> Fwiw, stgit is broken wrt sending email, all your emails have the exact
> same timestamp, which means that the emails will be ordered on received
> timestamp when threaded and generate the below mess:

Sorry for the inconvenience.  I'll try to find some workaround.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



Re: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

2015-08-20 Thread Peter Zijlstra

User-Agent: StGit/0.16

Fwiw, stgit is broken wrt sending email, all your emails have the exact
same timestamp, which means that the emails will be ordered on received
timestamp when threaded and generate the below mess:


 Aug 06 Hidehiro Kawai  (1.9K) [V3 PATCH 0/4] Fix race issues among panic, NMI 
and crash_kexec
 Aug 06 Hidehiro Kawai  (2.4K) ├─>[V3 PATCH 3/4] kexec: Fix race between 
panic() and crash_kexec() called directly
 Aug 06 Hidehiro Kawai  (4.9K) ├─>[V3 PATCH 1/4] panic/x86: Fix re-entrance 
problem due to panic on NMI
 Aug 06 Hidehiro Kawai  (5.3K) ├─>[V3 PATCH 2/4] panic/x86: Allow cpus to save 
registers even if they are looping in NMI context
 Aug 06 Hidehiro Kawai  (2.5K) ├─>[V3 PATCH 4/4] x86/apic: Introduce noextnmi 
boot option


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

2015-08-07 Thread Michal Hocko
JFYI I have those patches tested on the largish box. Will come back to
you as soon as I have some feedback.

I will also try to review these patches sometimes next week.

Thanks!

On Thu 06-08-15 14:45:43, Hidehiro Kawai wrote:
> When an HA cluster software or administrator detects non-response
> of a host, they issue an NMI to the host to completely stop current
> works and take a crash dump.  If the kernel has already panicked
> or is capturing a crash dump at that time, further NMI can cause
> a crash dump failure.
> 
> Also, crash_kexec() called from oops context and panic() can
> cause race conditions.
> 
> To solve these issue, this patch set does following things:
> 
> - Don't panic on NMI if the kernel has already panicked
> - Extend exclusion control currently done by panic_lock to crash_kexec
> - Introduce "noextnmi" boot option which masks external NMI at the
>   boot time (supported only for x86)
> 
> V3:
> - Introduce nmi_panic() macro to reduce code duplication
> - In the case of panic on NMI, don't return from NMI handlers
>   if another cpu already panicked
> 
> V2:
> - Use atomic_cmpxchg() instead of current spin_trylock() to exclude
>   concurrent accesses to panic() and crash_kexec()
> - Don't introduce no-lock version of panic() and crash_kexec()
> 
> ---
> 
> Hidehiro Kawai (4):
>   panic/x86: Fix re-entrance problem due to panic on NMI
>   panic/x86: Allow cpus to save registers even if they are looping in NMI 
> context
>   kexec: Fix race between panic() and crash_kexec() called directly
>   x86/apic: Introduce noextnmi boot option
> 
> 
>  Documentation/kernel-parameters.txt |4 
>  arch/x86/kernel/apic/apic.c |   17 -
>  arch/x86/kernel/nmi.c   |   15 +++
>  arch/x86/kernel/reboot.c|   11 +++
>  include/linux/kernel.h  |   21 +
>  kernel/kexec.c  |   20 
>  kernel/panic.c  |   23 ---
>  kernel/watchdog.c   |5 +++--
>  8 files changed, 106 insertions(+), 10 deletions(-)
> 
> 
> -- 
> Hidehiro Kawai
> Hitachi, Ltd. Research & Development Group
> 
> 

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

2015-08-05 Thread Hidehiro Kawai
When an HA cluster software or administrator detects non-response
of a host, they issue an NMI to the host to completely stop current
works and take a crash dump.  If the kernel has already panicked
or is capturing a crash dump at that time, further NMI can cause
a crash dump failure.

Also, crash_kexec() called from oops context and panic() can
cause race conditions.

To solve these issue, this patch set does following things:

- Don't panic on NMI if the kernel has already panicked
- Extend exclusion control currently done by panic_lock to crash_kexec
- Introduce "noextnmi" boot option which masks external NMI at the
  boot time (supported only for x86)

V3:
- Introduce nmi_panic() macro to reduce code duplication
- In the case of panic on NMI, don't return from NMI handlers
  if another cpu already panicked

V2:
- Use atomic_cmpxchg() instead of current spin_trylock() to exclude
  concurrent accesses to panic() and crash_kexec()
- Don't introduce no-lock version of panic() and crash_kexec()

---

Hidehiro Kawai (4):
  panic/x86: Fix re-entrance problem due to panic on NMI
  panic/x86: Allow cpus to save registers even if they are looping in NMI 
context
  kexec: Fix race between panic() and crash_kexec() called directly
  x86/apic: Introduce noextnmi boot option


 Documentation/kernel-parameters.txt |4 
 arch/x86/kernel/apic/apic.c |   17 -
 arch/x86/kernel/nmi.c   |   15 +++
 arch/x86/kernel/reboot.c|   11 +++
 include/linux/kernel.h  |   21 +
 kernel/kexec.c  |   20 
 kernel/panic.c  |   23 ---
 kernel/watchdog.c   |5 +++--
 8 files changed, 106 insertions(+), 10 deletions(-)


-- 
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/