RE: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec
> From: Peter Zijlstra [mailto:pet...@infradead.org] > User-Agent: StGit/0.16 > > Fwiw, stgit is broken wrt sending email, all your emails have the exact > same timestamp, which means that the emails will be ordered on received > timestamp when threaded and generate the below mess: Sorry for the inconvenience. I'll try to find some workaround. Regards, Hidehiro Kawai Hitachi, Ltd. Research & Development Group
Re: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec
User-Agent: StGit/0.16 Fwiw, stgit is broken wrt sending email, all your emails have the exact same timestamp, which means that the emails will be ordered on received timestamp when threaded and generate the below mess: Aug 06 Hidehiro Kawai (1.9K) [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec Aug 06 Hidehiro Kawai (2.4K) ├─>[V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly Aug 06 Hidehiro Kawai (4.9K) ├─>[V3 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI Aug 06 Hidehiro Kawai (5.3K) ├─>[V3 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Aug 06 Hidehiro Kawai (2.5K) ├─>[V3 PATCH 4/4] x86/apic: Introduce noextnmi boot option -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec
JFYI I have those patches tested on the largish box. Will come back to you as soon as I have some feedback. I will also try to review these patches sometimes next week. Thanks! On Thu 06-08-15 14:45:43, Hidehiro Kawai wrote: > When an HA cluster software or administrator detects non-response > of a host, they issue an NMI to the host to completely stop current > works and take a crash dump. If the kernel has already panicked > or is capturing a crash dump at that time, further NMI can cause > a crash dump failure. > > Also, crash_kexec() called from oops context and panic() can > cause race conditions. > > To solve these issue, this patch set does following things: > > - Don't panic on NMI if the kernel has already panicked > - Extend exclusion control currently done by panic_lock to crash_kexec > - Introduce "noextnmi" boot option which masks external NMI at the > boot time (supported only for x86) > > V3: > - Introduce nmi_panic() macro to reduce code duplication > - In the case of panic on NMI, don't return from NMI handlers > if another cpu already panicked > > V2: > - Use atomic_cmpxchg() instead of current spin_trylock() to exclude > concurrent accesses to panic() and crash_kexec() > - Don't introduce no-lock version of panic() and crash_kexec() > > --- > > Hidehiro Kawai (4): > panic/x86: Fix re-entrance problem due to panic on NMI > panic/x86: Allow cpus to save registers even if they are looping in NMI > context > kexec: Fix race between panic() and crash_kexec() called directly > x86/apic: Introduce noextnmi boot option > > > Documentation/kernel-parameters.txt |4 > arch/x86/kernel/apic/apic.c | 17 - > arch/x86/kernel/nmi.c | 15 +++ > arch/x86/kernel/reboot.c| 11 +++ > include/linux/kernel.h | 21 + > kernel/kexec.c | 20 > kernel/panic.c | 23 --- > kernel/watchdog.c |5 +++-- > 8 files changed, 106 insertions(+), 10 deletions(-) > > > -- > Hidehiro Kawai > Hitachi, Ltd. Research & Development Group > > -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec
When an HA cluster software or administrator detects non-response of a host, they issue an NMI to the host to completely stop current works and take a crash dump. If the kernel has already panicked or is capturing a crash dump at that time, further NMI can cause a crash dump failure. Also, crash_kexec() called from oops context and panic() can cause race conditions. To solve these issue, this patch set does following things: - Don't panic on NMI if the kernel has already panicked - Extend exclusion control currently done by panic_lock to crash_kexec - Introduce "noextnmi" boot option which masks external NMI at the boot time (supported only for x86) V3: - Introduce nmi_panic() macro to reduce code duplication - In the case of panic on NMI, don't return from NMI handlers if another cpu already panicked V2: - Use atomic_cmpxchg() instead of current spin_trylock() to exclude concurrent accesses to panic() and crash_kexec() - Don't introduce no-lock version of panic() and crash_kexec() --- Hidehiro Kawai (4): panic/x86: Fix re-entrance problem due to panic on NMI panic/x86: Allow cpus to save registers even if they are looping in NMI context kexec: Fix race between panic() and crash_kexec() called directly x86/apic: Introduce noextnmi boot option Documentation/kernel-parameters.txt |4 arch/x86/kernel/apic/apic.c | 17 - arch/x86/kernel/nmi.c | 15 +++ arch/x86/kernel/reboot.c| 11 +++ include/linux/kernel.h | 21 + kernel/kexec.c | 20 kernel/panic.c | 23 --- kernel/watchdog.c |5 +++-- 8 files changed, 106 insertions(+), 10 deletions(-) -- Hidehiro Kawai Hitachi, Ltd. Research & Development Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/