RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-11-24 Thread 河合英宏 / KAWAI,HIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote:
> > Currently, panic() and crash_kexec() can be called at the same time.
> > For example (x86 case):
> >
> > CPU 0:
> >   oops_end()
> > crash_kexec()
> >   mutex_trylock() // acquired
> > nmi_shootdown_cpus() // stop other cpus
> >
> > CPU 1:
> >   panic()
> > crash_kexec()
> >   mutex_trylock() // failed to acquire
> > smp_send_stop() // stop other cpus
> > infinite loop
> >
> > If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
> > fails.
> 
> So the smp_send_stop() stops CPU 0 from calling nmi_shootdown_cpus(), right?

Yes, but the important thing is that CPU 1 stops CPU 0 which is
only CPU processing crash_ kexec routines.

> >
> > In another case:
> >
> > CPU 0:
> >   oops_end()
> > crash_kexec()
> >   mutex_trylock() // acquired
> > 
> > io_check_error()
> >   panic()
> > crash_kexec()
> >   mutex_trylock() // failed to acquire
> > infinite loop
> >
> > Clearly, this is an undesirable result.
> 
> I'm trying to see how this patch fixes this case.
> 
> >
> > To fix this problem, this patch changes crash_kexec() to exclude
> > others by using atomic_t panic_cpu.
> >
> > V5:
> > - Add missing dummy __crash_kexec() for !CONFIG_KEXEC_CORE case
> > - Replace atomic_xchg() with atomic_set() in crash_kexec() because
> >   it is used as a release operation and there is no need of memory
> >   barrier effect.  This change also removes an unused value warning
> >
> > V4:
> > - Use new __crash_kexec(), no exclusion check version of crash_kexec(),
> >   instead of checking if panic_cpu is the current cpu or not
> >
> > V2:
> > - Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
> >   to exclude concurrent accesses
> > - Don't introduce no-lock version of crash_kexec()
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Eric Biederman 
> > Cc: Vivek Goyal 
> > Cc: Andrew Morton 
> > Cc: Michal Hocko 
> > ---
> >  include/linux/kexec.h |2 ++
> >  kernel/kexec_core.c   |   26 +-
> >  kernel/panic.c|4 ++--
> >  3 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > index d140b1e..7b68d27 100644
> > --- a/include/linux/kexec.h
> > +++ b/include/linux/kexec.h
> > @@ -237,6 +237,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage 
> > *image,
> >   unsigned int size, bool get_value);
> >  extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
> >  const char *name);
> > +extern void __crash_kexec(struct pt_regs *);
> >  extern void crash_kexec(struct pt_regs *);
> >  int kexec_should_crash(struct task_struct *);
> >  void crash_save_cpu(struct pt_regs *regs, int cpu);
> > @@ -332,6 +333,7 @@ int __weak arch_kexec_apply_relocations(const Elf_Ehdr 
> > *ehdr, Elf_Shdr *sechdrs,
> >  #else /* !CONFIG_KEXEC_CORE */
> >  struct pt_regs;
> >  struct task_struct;
> > +static inline void __crash_kexec(struct pt_regs *regs) { }
> >  static inline void crash_kexec(struct pt_regs *regs) { }
> >  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
> >  #define kexec_in_progress false
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index 11b64a6..9d097f5 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -853,7 +853,8 @@ struct kimage *kexec_image;
> >  struct kimage *kexec_crash_image;
> >  int kexec_load_disabled;
> >
> > -void crash_kexec(struct pt_regs *regs)
> > +/* No panic_cpu check version of crash_kexec */
> > +void __crash_kexec(struct pt_regs *regs)
> >  {
> > /* Take the kexec_mutex here to prevent sys_kexec_load
> >  * running on one cpu from replacing the crash kernel
> > @@ -876,6 +877,29 @@ void crash_kexec(struct pt_regs *regs)
> > }
> >  }
> >
> > +void crash_kexec(struct pt_regs *regs)
> > +{
> > +   int old_cpu, this_cpu;
> > +
> > +   /*
> > +* Only one CPU is allowed to execute the crash_kexec() code as with
> > +* panic().  Otherwise parallel calls of panic() and crash_kexec()
> > +* may stop each other.  To exclude them, we use panic_cpu here too.
> > +*/
> > +   this_cpu = raw_smp_processor_id();
> > +   old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > +   if (old_cpu == -1) {
> > +   /* This is the 1st CPU which comes here, so go ahead. */
> > +   __crash_kexec(regs);
> > +
> > +   /*
> > +* Reset panic_cpu to allow another panic()/crash_kexec()
> > +* call.
> > +*/
> > +   atomic_set(&panic_cpu, -1);
> > +   }
> > +}
> > +
> >  size_t crash_get_memory_size(void)
> >  {
> > size_t size = 0;
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 4fce2be..5d0b807 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -138,7 +138,7 @@ void panic(cons

RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread 河合英宏 / KAWAI,HIDEHIRO
> On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> >
> > > +  */
> > > + while (!raw_spin_trylock(&nmi_reason_lock))
> > > + poll_crash_ipi_and_callback(regs);
> >
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
> 
> This only does something if crash_ipi_done is set, which means you are killing
> the box. But perhaps a comment that states that here would be useful, or maybe
> just put in the check here. 

OK, I'll add more comments around this.

> There's no need to make it depend on SMP, as
> raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
> hit.

I'll integrate these SMP and UP versions with a comment about
that.

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread 河合英宏 / KAWAI,HIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:46PM +0900, Hidehiro Kawai wrote:
> > nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI
> > to non-panic cpus to stop them while saving their register
> 
>...to stop them and save their register...

Thanks for the correction.

> > information and doing some cleanups for crash dumping.  So if a
> > non-panic cpus is infinitely looping in NMI context, we fail to
> 
> That should be CPU. Please use "CPU" instead of "cpu" in all your text
> in your next submission.

OK, I'll fix that.

> > save its register information and lose the information from the
> > crash dump.
> >
> > `Infinite loop in NMI context' can happen:
> >
> >   a. when a cpu panics on NMI while another cpu is processing panic
> >   b. when a cpu received an external or unknown NMI while another
> >  cpu is processing panic on NMI
> >
> > In the case of a, it loops in panic_smp_self_stop().  In the case
> > of b, it loops in raw_spin_lock() of nmi_reason_lock.
> 
> Please describe those two cases more verbosely - it takes slow people
> like me a while to figure out what exactly can happen.

  a. when a cpu panics on NMI while another cpu is processing panic
 Ex.
 CPU 0 CPU 1
 = =
 panic()
   panic_cpu <-- 0
   check panic_cpu
   crash_kexec()
   receive an unknown NMI
   unknown_nmi_error()
 nmi_panic()
   panic()
 check panic_cpu
 panic_smp_self_stop()
   infinite loop in NMI context

  b. when a cpu received an external or unknown NMI while another
 cpu is processing panic on NMI
 Ex.
 CPU 0 CPU 1
 ====
 receive an unknown NMI
 unknown_nmi_error()
   nmi_panic() receive an unknown NMI
 panic_cpu <-- 0   unknown_nmi_error()
 panic() nmi_panic()
   check panic_cpu panic
   crash_kexec() check panic_cpu
 panic_smp_self_stop()
   infinite loop in NMI context
 
> > This can
> > happen on some servers which broadcasts NMIs to all CPUs when a dump
> > button is pushed.
> >
> > To save registers in these case too, this patch does following things:
> >
> > 1. Move the timing of `infinite loop in NMI context' (actually
> >done by panic_smp_self_stop()) outside of panic() to enable us to
> >refer pt_regs
> 
> I can't parse that sentence. And I really tried :-\
> panic_smp_self_stop() is still in panic().

panic_smp_self_stop() is still used when a CPU in normal context
should go into infinite loop.  Only when a CPU is in NMI context,
nmi_panic_self_stop() is called and the CPU loops infinitely
without entering panic().

I'll try to revise this sentense.

> > 2. call a callback of nmi_shootdown_cpus() directly to save
> >registers and do some cleanups after setting waiting_for_crash_ipi
> >which is used for counting down the number of cpus which handled
> >the callback
> >
> > V5:
> > - Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the
> >   compiler doesn't change the instruction order
> > - Support the case of b in the above description
> > - Add poll_crash_ipi_and_callback()
> >
> > V4:
> > - Rewrite the patch description
> >
> > V3:
> > - Newly introduced
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Peter Zijlstra 
> > Cc: Eric Biederman 
> > Cc: Vivek Goyal 
> > Cc: Michal Hocko 
> > ---
> >  arch/x86/include/asm/reboot.h |1 +
> >  arch/x86/kernel/nmi.c |   17 +
> >  arch/x86/kernel/reboot.c  |   28 
> >  include/linux/kernel.h|   12 ++--
> >  kernel/panic.c|   10 ++
> >  kernel/watchdog.c |2 +-
> >  6 files changed, 63 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> > index a82c4f1..964e82f 100644
> > --- a/arch/x86/include/asm/reboot.h
> > +++ b/arch/x86/include/asm/reboot.h
> > @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
> >
> >  typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
> >  void nmi_shootdown_cpus(nmi_shootdown_cb callback);
> > +void poll_crash_ipi_and_callback(struct pt_regs *regs);
> >
> >  #endif /* _ASM_X86_REBOOT_H */
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 5131714..74a1434 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #defin

[PATCH 1/1] Improve device tree directory sorting

2015-11-24 Thread Curt Brune
Previously when sorting the device tree directory entries, if both
device tree entries contained the '@' character then comparison was
made based on the length of the strings.

This did not work in all cases and could result in odd orderings.

This patch modifies the comparison function for the case when both
strings contain the '@' character.

First a lexical comparison is made between the prefix portions of the
strings *before* the '@' character.

Next, if the prefixes are equal, the lengths of the suffixes *after*
the '@' character are compared.  This preserves the intent of the
original code.

Next, if the suffix lengths are equal, a lexical comparison of the
suffixes is made.

This is still not strictly correct, as ideally the portion after the
'@' should be compared numerically.  However, determining what base to
use for all case is difficult.

Signed-off-by: Curt Brune 
---
 kexec/arch/ppc/fs2dt.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/kexec/arch/ppc/fs2dt.c b/kexec/arch/ppc/fs2dt.c
index 4121c7d..6e77379 100644
--- a/kexec/arch/ppc/fs2dt.c
+++ b/kexec/arch/ppc/fs2dt.c
@@ -30,6 +30,7 @@
 #include 
 #include "../../kexec.h"
 #include "kexec-ppc.h"
+#include "types.h"
 
 #define MAXPATH1024/* max path name length */
 #define NAMESPACE  16384   /* max bytes for property names */
@@ -296,6 +297,8 @@ static int comparefunc(const void *dentry1, const void 
*dentry2)
 {
char *str1 = (*(struct dirent **)dentry1)->d_name;
char *str2 = (*(struct dirent **)dentry2)->d_name;
+   char *p1, *p2;
+   int res = 0, max_len;
 
/*
 * strcmp scans from left to right and fails to idetify for some
@@ -303,11 +306,21 @@ static int comparefunc(const void *dentry1, const void 
*dentry2)
 * Therefore, we get the wrong sorted order like memory@1000 and
 * memory@f00.
 */
-   if (strchr(str1, '@') && strchr(str2, '@') &&
-   (strlen(str1) > strlen(str2)))
-   return 1;
+   if ((p1 = strchr(str1, '@')) && (p2 = strchr(str2, '@'))) {
+   max_len = max(p1 - str1, p2 - str2);
+   if ((res = strncmp(str1, str2, max_len)) == 0) {
+   /* prefix is equal - compare part after '@' by length */
+   p1++; p2++;
+   res = strlen(p1) - strlen(p2);
+   if (res == 0)
+   /* equal length, compare by strcmp() */
+   res = strcmp(p1,p2);
+   }
+} else {
+   res = strcmp(str1, str2);
+}
 
-   return strcmp(str1, str2);
+   return res;
 }
 
 /*
-- 
1.9.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/1] Improve device tree directory sorting

2015-11-24 Thread Curt Brune
This patch improves the device tree directory sorting.

Previously the sorting algorithm would result in the following ordering for a
Freescale P2020 SoC device (Rooted under: /proc/device-tree/soc@ffe0)

  #address-cells
  #size-cells
  .
  ..
  bus-frequency
  compatible
  device_type
  i2c@3000
  i2c@3100
  mdio@24520
  msi@41600
  mdio@26520
  serial@4600
  ethernet@26000
  global-utilities@e
  memory-controller@2000
  l2-cache-controller@2
  name
  pic@4
  ranges
  sdhci@2e000
  serial@4500

Ideally 'serial@4500' would come before 'serial@4600'.  This would
cause the new kexec-ed kernel to have the serial consoles detected in
the wrong order, i.e. ttyS0 and ttyS1 are swapped.  

Using the attached patch the same directory is ordered as:

  #address-cells
  #size-cells
  .
  ..
  bus-frequency
  compatible
  device_type
  ethernet@26000
  global-utilities@e
  i2c@3000
  i2c@3100
  l2-cache-controller@2
  mdio@24520
  mdio@26520
  memory-controller@2000
  msi@41600
  name
  pic@4
  ranges
  sdhci@2e000
  serial@4500
  serial@4600

Curt Brune (1):
  Improve device tree directory sorting

 kexec/arch/ppc/fs2dt.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

-- 
1.9.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 15/16] arm64: kdump: enable kdump in the arm64 defconfig

2015-11-24 Thread Geoff Levand
From: AKASHI Takahiro 

Signed-off-by: AKASHI Takahiro 
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 0470fdf..6dc3d00 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -59,6 +59,7 @@ CONFIG_KSM=y
 CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_CMA=y
 CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
 CONFIG_CMDLINE="console=ttyAMA0"
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_COMPAT=y
-- 
2.5.0



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 14/16] arm64: kdump: update a kernel doc

2015-11-24 Thread Geoff Levand
From: AKASHI Takahiro 

This patch adds arch specific descriptions about kdump usage on arm64
to kdump.txt.

Signed-off-by: AKASHI Takahiro 
---
 Documentation/kdump/kdump.txt | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index bc4bd5a..7d3fad0 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the 
network to
 a remote system.
 
 Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x and arm architectures.
+s390x, arm and arm64 architectures.
 
 When the system kernel boots, it reserves a small section of memory for
 the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -249,6 +249,29 @@ Dump-capture kernel config options (Arch Dependent, arm)
 
 AUTO_ZRELADDR=y
 
+Dump-capture kernel config options (Arch Dependent, arm64)
+--
+
+1) Disable symmetric multi-processing support under "Processor type and
+   features":
+
+   CONFIG_SMP=n
+
+   (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
+   when loading the dump-capture kernel, see section "Load the Dump-capture
+   Kernel".)
+
+2) The maximum memory size on the dump-capture kernel must be limited by
+   specifying:
+
+   mem=X[MG]
+
+   where X should be less than or equal to the size in "crashkernel="
+   boot parameter. Kexec-tools will automatically add this.
+
+3) Currently, kvm will not be enabled on the dump-capture kernel even
+   if it is configured.
+
 Extended crashkernel syntax
 ===
 
@@ -312,6 +335,8 @@ Boot into System Kernel
any space below the alignment point may be overwritten by the dump-capture 
kernel,
which means it is possible that the vmcore is not that precise as expected.
 
+   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
 
 Load the Dump-capture Kernel
 
@@ -334,6 +359,8 @@ For s390x:
- Use image or bzImage
 For arm:
- Use zImage
+For arm64:
+   - Use vmlinux
 
 If you are using a uncompressed vmlinux image then use following command
 to load dump-capture kernel.
@@ -377,6 +404,9 @@ For s390x:
 For arm:
"1 maxcpus=1 reset_devices"
 
+For arm64:
+   "1 mem=X[MG] maxcpus=1 reset_devices"
+
 Notes on loading the dump-capture kernel:
 
 * By default, the ELF headers are stored in ELF64 format to support
-- 
2.5.0



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 13/16] arm64: kdump: add kdump support

2015-11-24 Thread Geoff Levand
From: AKASHI Takahiro 

On crash dump kernel, all the information about primary kernel's core
image is available in elf core header specified by "elfcorehdr=" boot
parameter. reserve_elfcorehdr() will set aside the region to avoid any
corruption by crash dump kernel.

Crash dump kernel will access the system memory of primary kernel via
copy_oldmem_page(), which reads one page by ioremap'ing it since it does
not reside in linear mapping on crash dump kernel.
Please note that we should add "mem=X[MG]" boot parameter to limit the
memory size and avoid the following assertion at ioremap():
if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr
return NULL;
when accessing any pages beyond the usable memories of crash dump kernel.

We also need our own elfcorehdr_read() here since the weak definition of
elfcorehdr_read() utilizes copy_oldmem_page() and will hit the assertion
above on arm64.

Signed-off-by: AKASHI Takahiro 
---
 arch/arm64/Kconfig | 12 +++
 arch/arm64/kernel/Makefile |  1 +
 arch/arm64/kernel/crash_dump.c | 71 ++
 arch/arm64/mm/init.c   | 29 +
 4 files changed, 113 insertions(+)
 create mode 100644 arch/arm64/kernel/crash_dump.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c23fd77..4bac7dc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -545,6 +545,18 @@ config KEXEC
  but it is independent of the system firmware.   And like a reboot
  you can start any kernel with it, not just Linux.
 
+config CRASH_DUMP
+   bool "Build kdump crash kernel"
+   help
+ Generate crash dump after being started by kexec. This should
+ be normally only set in special crash dump kernels which are
+ loaded in the main kernel with kexec-tools into a specially
+ reserved region and then later executed after a crash by
+ kdump/kexec. The crash dump kernel must be compiled to a
+ memory address not used by the main kernel.
+
+ For more details see Documentation/kdump/kdump.txt
+
 config XEN_DOM0
def_bool y
depends on XEN
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index f68420d..a08b054 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -43,6 +43,7 @@ arm64-obj-$(CONFIG_ARMV8_DEPRECATED)  += armv8_deprecated.o
 arm64-obj-$(CONFIG_ACPI)   += acpi.o
 arm64-obj-$(CONFIG_KEXEC)  += machine_kexec.o relocate_kernel.o
\
   cpu-reset.o
+arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
 
 obj-y  += $(arm64-obj-y) vdso/
 obj-m  += $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 000..3d86c0a
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+size_t csize, unsigned long offset,
+int userbuf)
+{
+   void *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = ioremap_cache(pfn << PAGE_SHIFT, PAGE_SIZE);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user(buf, vaddr + offset, csize)) {
+   iounmap(vaddr);
+   return -EFAULT;
+   }
+   } else {
+   memcpy(buf, vaddr + offset, csize);
+   }
+
+   iounmap(vaddr);
+
+   return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ * @buf: buffer where the data is placed
+ * @csize: number of bytes to read
+ * @ppos: address in the memory
+ *
+ * This function reads @count bytes from elf core header which exists
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+   memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+   return count;
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/

[PATCH v12 06/16] Revert "arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function"

2015-11-24 Thread Geoff Levand
This reverts commit c51e97d89e526368eb697f87cd4d391b9e19f369.

Add back the cpu_set_idmap_tcr_t0sz function needed by setup_mmu_for_reboot.
---
 arch/arm64/include/asm/mmu_context.h | 35 +++
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index 2416578..7567030 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -70,23 +70,34 @@ static inline bool __cpu_uses_extended_idmap(void)
unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
 }
 
+static inline void __cpu_set_tcr_t0sz(u64 t0sz)
+{
+   unsigned long tcr;
+
+   if (__cpu_uses_extended_idmap())
+   asm volatile (
+   "   mrs %0, tcr_el1 ;"
+   "   bfi %0, %1, %2, %3  ;"
+   "   msr tcr_el1, %0 ;"
+   "   isb"
+   : "=&r" (tcr)
+   : "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
+}
+
+/*
+ * Set TCR.T0SZ to the value appropriate for activating the identity map.
+ */
+static inline void cpu_set_idmap_tcr_t0sz(void)
+{
+   __cpu_set_tcr_t0sz(idmap_t0sz);
+}
+
 /*
  * Set TCR.T0SZ to its default value (based on VA_BITS)
  */
 static inline void cpu_set_default_tcr_t0sz(void)
 {
-   unsigned long tcr;
-
-   if (!__cpu_uses_extended_idmap())
-   return;
-
-   asm volatile (
-   "   mrs %0, tcr_el1 ;"
-   "   bfi %0, %1, %2, %3  ;"
-   "   msr tcr_el1, %0 ;"
-   "   isb"
-   : "=&r" (tcr)
-   : "r"(TCR_T0SZ(VA_BITS)), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
+   __cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
 }
 
 /*
-- 
2.5.0



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 05/16] arm64: Add back cpu_reset routines

2015-11-24 Thread Geoff Levand
Commit 68234df4ea7939f98431aa81113fbdce10c4a84b (arm64: kill flush_cache_all())
removed the global arm64 routines cpu_reset() and cpu_soft_restart() needed by
the arm64 kexec and kdump support.  Add simplified versions of those two
routines back with some changes needed for kexec in the new files cpu_reset.S,
and cpu_reset.h.

When a CPU is reset it needs to be put into the exception level it had
when it entered the kernel. Update cpu_reset() to accept an argument
which signals if the reset address needs to be entered at EL1 or EL2.

Signed-off-by: Geoff Levand 
---
 arch/arm64/kernel/cpu-reset.S | 84 +++
 arch/arm64/kernel/cpu-reset.h | 22 
 2 files changed, 106 insertions(+)
 create mode 100644 arch/arm64/kernel/cpu-reset.S
 create mode 100644 arch/arm64/kernel/cpu-reset.h

diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
new file mode 100644
index 000..0423f27
--- /dev/null
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -0,0 +1,84 @@
+/*
+ * cpu reset routines
+ *
+ * Copyright (C) 2001 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012 ARM Ltd.
+ * Copyright (C) 2015 Huawei Futurewei Technologies.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+
+#define SCTLR_ELx_I(1 << 12)
+#define SCTLR_ELx_SA   (1 << 3)
+#define SCTLR_ELx_C(1 << 2)
+#define SCTLR_ELx_A(1 << 1)
+#define SCTLR_ELx_M1
+#define SCTLR_ELx_FLAGS(SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C |  
\
+SCTLR_ELx_SA | SCTLR_ELx_I)
+
+.text
+.pushsection.idmap.text, "ax"
+
+.align 5
+
+/*
+ * cpu_soft_restart(cpu_reset, el2_switch, entry, arg0, arg1) - Perform a cpu
+ * soft reset.
+ *
+ * @cpu_reset: Physical address of the cpu_reset routine.
+ * @el2_switch: Flag to indicate a swich to EL2 is needed, passed to cpu_reset.
+ * @entry: Location to jump to for soft reset, passed to cpu_reset.
+ * arg0: First argument passed to @entry.
+ * arg1: Second argument passed to @entry.
+ */
+
+ENTRY(cpu_soft_restart)
+   mov x18, x0 // cpu_reset
+   mov x0, x1  // el2_switch
+   mov x1, x2  // entry
+   mov x2, x3  // arg0
+   mov x3, x4  // arg1
+   ret x18
+ENDPROC(cpu_soft_restart)
+
+/*
+ * cpu_reset(el2_switch, entry, arg0, arg1) - Helper for cpu_soft_restart.
+ *
+ * @el2_switch: Flag to indicate a swich to EL2 is needed.
+ * @entry: Location to jump to for soft reset.
+ * arg0: First argument passed to @entry.
+ * arg1: Second argument passed to @entry.
+ *
+ * Put the CPU into the same state as it would be if it had been reset, and
+ * branch to what would be the reset vector. It must be executed with the
+ * flat identity mapping.
+ */
+
+ENTRY(cpu_reset)
+   /* Clear sctlr_el1 flags. */
+   mrs x12, sctlr_el1
+   ldr x13, =SCTLR_ELx_FLAGS
+   bic x12, x12, x13
+   msr sctlr_el1, x12
+   isb
+
+   cbz x0, 1f  // el2_switch?
+
+   mov x0, x1  // entry
+   mov x1, x2  // arg0
+   mov x2, x3  // arg1
+   hvc #HVC_CALL_FUNC  // no return
+
+1: mov x18, x1 // entry
+   mov x0, x2  // arg0
+   mov x1, x3  // arg1
+   ret x18
+ENDPROC(cpu_reset)
+
+.popsection
diff --git a/arch/arm64/kernel/cpu-reset.h b/arch/arm64/kernel/cpu-reset.h
new file mode 100644
index 000..79816b6
--- /dev/null
+++ b/arch/arm64/kernel/cpu-reset.h
@@ -0,0 +1,22 @@
+/*
+ * cpu reset routines
+ *
+ * Copyright (C) 2015 Huawei Futurewei Technologies.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#if !defined(_ARM64_CPU_RESET_H)
+#define _ARM64_CPU_RESET_H
+
+#include 
+
+void __attribute__((noreturn)) cpu_reset(unsigned long el2_switch,
+   unsigned long entry, unsigned long arg0, unsigned long arg1);
+void __attribute__((noreturn)) cpu_soft_restart(phys_addr_t cpu_reset,
+   unsigned long el2_switch, unsigned long entry, unsigned long arg0,
+   unsigned long arg1);
+
+#endif
-- 
2.5.0



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 16/16] arm64: kdump: relax BUG_ON() if more than one cpus are still active

2015-11-24 Thread Geoff Levand
From: AKASHI Takahiro 

We should try best in case of kdump.
So even if not all secondary cpus have shut down, we do kdump anyway.
---
 arch/arm64/kernel/machine_kexec.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index d2d7e90..482aae7 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -148,7 +148,13 @@ void machine_kexec(struct kimage *kimage)
phys_addr_t reboot_code_buffer_phys;
void *reboot_code_buffer;
 
-   BUG_ON(num_online_cpus() > 1);
+   if (num_online_cpus() > 1) {
+   if (in_crash_kexec)
+   pr_warn("kdump might fail because %d cpus are still 
online\n",
+   num_online_cpus());
+   else
+   BUG();
+   }
 
reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
-- 
2.5.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 12/16] arm64: kdump: implement machine_crash_shutdown()

2015-11-24 Thread Geoff Levand
From: AKASHI Takahiro 

kdump calls machine_crash_shutdown() to shut down non-boot cpus and
save registers' status in per-cpu ELF notes before starting the crash
dump kernel. See kernel_kexec().

ipi_cpu_stop() is a bit modified and used to support this behavior.

Signed-off-by: AKASHI Takahiro 
---
 arch/arm64/include/asm/kexec.h| 34 +-
 arch/arm64/kernel/machine_kexec.c | 31 +--
 arch/arm64/kernel/smp.c   | 16 ++--
 3 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 46d63cd..555a955 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -30,6 +30,8 @@
 
 #if !defined(__ASSEMBLY__)
 
+extern bool in_crash_kexec;
+
 /**
  * crash_setup_regs() - save registers for the panic kernel
  *
@@ -40,7 +42,37 @@
 static inline void crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
 {
-   /* Empty routine needed to avoid build errors. */
+   if (oldregs) {
+   memcpy(newregs, oldregs, sizeof(*newregs));
+   } else {
+   __asm__ __volatile__ (
+   "stp x0,   x1, [%3, #16 *  0]\n"
+   "stp x2,   x3, [%3, #16 *  1]\n"
+   "stp x4,   x5, [%3, #16 *  2]\n"
+   "stp x6,   x7, [%3, #16 *  3]\n"
+   "stp x8,   x9, [%3, #16 *  4]\n"
+   "stpx10,  x11, [%3, #16 *  5]\n"
+   "stpx12,  x13, [%3, #16 *  6]\n"
+   "stpx14,  x15, [%3, #16 *  7]\n"
+   "stpx16,  x17, [%3, #16 *  8]\n"
+   "stpx18,  x19, [%3, #16 *  9]\n"
+   "stpx20,  x21, [%3, #16 * 10]\n"
+   "stpx22,  x23, [%3, #16 * 11]\n"
+   "stpx24,  x25, [%3, #16 * 12]\n"
+   "stpx26,  x27, [%3, #16 * 13]\n"
+   "stpx28,  x29, [%3, #16 * 14]\n"
+   "strx30,   [%3, #16 * 15]\n"
+   "mov%0, sp\n"
+   "adr%1, 1f\n"
+   "mrs%2, spsr_el1\n"
+   "1:"
+   : "=r" (newregs->sp),
+ "=r" (newregs->pc),
+ "=r" (newregs->pstate)
+   : "r"  (&newregs->regs)
+   : "memory"
+   );
+   }
 }
 
 #endif /* !defined(__ASSEMBLY__) */
diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index da28a26..d2d7e90 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,7 @@
  * published by the Free Software Foundation.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -23,6 +24,7 @@
 extern const unsigned char arm64_relocate_new_kernel[];
 extern const unsigned long arm64_relocate_new_kernel_size;
 
+bool in_crash_kexec;
 static unsigned long kimage_start;
 
 /**
@@ -203,13 +205,38 @@ void machine_kexec(struct kimage *kimage)
 */
 
cpu_soft_restart(virt_to_phys(cpu_reset),
-   is_hyp_mode_available(),
+   in_crash_kexec ? 0 : is_hyp_mode_available(),
reboot_code_buffer_phys, kimage->head, kimage_start);
 
BUG(); /* Should never get here. */
 }
 
+/**
+ * machine_crash_shutdown - shutdown non-boot cpus and save registers
+ */
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-   /* Empty routine needed to avoid build errors. */
+   struct pt_regs dummy_regs;
+   int cpu;
+
+   local_irq_disable();
+
+   in_crash_kexec = true;
+
+   /*
+* clear and initialize the per-cpu info. This is necessary
+* because, otherwise, slots for offline cpus would never be
+* filled up. See smp_send_stop().
+*/
+   memset(&dummy_regs, 0, sizeof(dummy_regs));
+   for_each_possible_cpu(cpu)
+   crash_save_cpu(&dummy_regs, cpu);
+
+   /* shutdown non-boot cpus */
+   smp_send_stop();
+
+   /* for boot cpu */
+   crash_save_cpu(regs, smp_processor_id());
+
+   pr_info("Starting crashdump kernel...\n");
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b1adc51..15aabef 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -54,6 +55,8 @@
 #include 
 #include 
 
+#include "cpu-reset.h"
+
 #define CREATE_TRACE_POINTS
 #include 
 
@@ -683,8 +686,12 @@ static DEFINE_RAW_SPINLOCK(stop_lock);
 /*
  * ipi_cpu_stop - handle IPI from smp_send_stop()
  */
-static void ipi_cpu_stop(unsigned int cpu)
+static void ipi_cpu_stop(unsigned int cpu, struct pt_regs *regs)
 {
+#ifdef CONFIG_KEXEC
+   /* printing messages 

[PATCH v12 11/16] arm64: kdump: reserve memory for crash dump kernel

2015-11-24 Thread Geoff Levand
From: AKASHI Takahiro 

On primary kernel, the memory region used by crash dump kernel must be
specified by "crashkernel=" boot parameter. reserve_crashkernel()
will allocate and reserve the region for later use.

User space tools will be able to find the region marked as "Crash kernel"
in /proc/iomem.

Signed-off-by: AKASHI Takahiro 
---
 arch/arm64/kernel/setup.c |  7 +-
 arch/arm64/mm/init.c  | 54 +++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 8119479..f9fffc9 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -221,6 +220,12 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
}
+
+#ifdef CONFIG_KEXEC
+   /* User space tools will find "Crash kernel" region in /proc/iomem. */
+   if (crashk_res.end)
+   insert_resource(&iomem_resource, &crashk_res);
+#endif
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 17bf39a..24f0a1c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -66,6 +67,55 @@ static int __init early_initrd(char *p)
 early_param("initrd", early_initrd);
 #endif
 
+#ifdef CONFIG_KEXEC
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+   unsigned long long crash_size = 0, crash_base = 0;
+   int ret;
+
+   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+   &crash_size, &crash_base);
+   if (ret)
+   return;
+
+   if (crash_base == 0) {
+   crash_base = memblock_alloc(crash_size, 1 << 21);
+   if (crash_base == 0) {
+   pr_warn("Unable to allocate crashkernel (size:%llx)\n",
+   crash_size);
+   return;
+   }
+   } else {
+   /* User specifies base address explicitly. */
+   if (!memblock_is_region_memory(crash_base, crash_size) ||
+   memblock_is_region_reserved(crash_base, crash_size)) {
+   pr_warn("crashkernel has wrong address or size\n");
+   return;
+   }
+
+   if (crash_base & ((1 << 21) - 1)) {
+   pr_warn("crashkernel base address is not 2MB 
aligned\n");
+   return;
+   }
+
+   memblock_reserve(crash_base, crash_size);
+   }
+
+   pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
+   crash_size >> 20, crash_base >> 20);
+
+   crashk_res.start = crash_base;
+   crashk_res.end = crash_base + crash_size - 1;
+}
+#endif /* CONFIG_KEXEC */
+
 /*
  * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
  * currently assumes that for memory starting above 4G, 32-bit devices will
@@ -171,6 +221,10 @@ void __init arm64_memblock_init(void)
memblock_reserve(__virt_to_phys(initrd_start), initrd_end - 
initrd_start);
 #endif
 
+#ifdef CONFIG_KEXEC
+   reserve_crashkernel();
+#endif
+
early_init_fdt_scan_reserved_mem();
 
/* 4GB maximum for 32-bit only capable devices */
-- 
2.5.0



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 10/16] arm64/kexec: Enable kexec in the arm64 defconfig

2015-11-24 Thread Geoff Levand
Signed-off-by: Geoff Levand 
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index bdd7aa3..0470fdf 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -58,6 +58,7 @@ CONFIG_PREEMPT=y
 CONFIG_KSM=y
 CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_CMA=y
+CONFIG_KEXEC=y
 CONFIG_CMDLINE="console=ttyAMA0"
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_COMPAT=y
-- 
2.5.0



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 08/16] arm64/kexec: Add core kexec support

2015-11-24 Thread Geoff Levand
Add three new files, kexec.h, machine_kexec.c and relocate_kernel.S to the
arm64 architecture that add support for the kexec re-boot mechanism
(CONFIG_KEXEC) on arm64 platforms.

Signed-off-by: Geoff Levand 
---
 arch/arm64/Kconfig  |  10 +++
 arch/arm64/include/asm/kexec.h  |  48 
 arch/arm64/kernel/Makefile  |   2 +
 arch/arm64/kernel/machine_kexec.c   | 152 
 arch/arm64/kernel/relocate_kernel.S | 131 +++
 include/uapi/linux/kexec.h  |   1 +
 6 files changed, 344 insertions(+)
 create mode 100644 arch/arm64/include/asm/kexec.h
 create mode 100644 arch/arm64/kernel/machine_kexec.c
 create mode 100644 arch/arm64/kernel/relocate_kernel.S

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9ac16a4..c23fd77 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -535,6 +535,16 @@ config SECCOMP
  and the task is only allowed to execute a few safe syscalls
  defined by each seccomp mode.
 
+config KEXEC
+   depends on PM_SLEEP_SMP
+   select KEXEC_CORE
+   bool "kexec system call"
+   ---help---
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel.  It is like a reboot
+ but it is independent of the system firmware.   And like a reboot
+ you can start any kernel with it, not just Linux.
+
 config XEN_DOM0
def_bool y
depends on XEN
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
new file mode 100644
index 000..46d63cd
--- /dev/null
+++ b/arch/arm64/include/asm/kexec.h
@@ -0,0 +1,48 @@
+/*
+ * kexec for arm64
+ *
+ * Copyright (C) Linaro.
+ * Copyright (C) Huawei Futurewei Technologies.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#if !defined(_ARM64_KEXEC_H)
+#define _ARM64_KEXEC_H
+
+/* Maximum physical address we can use pages from */
+
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+
+/* Maximum address we can reach in physical address mode */
+
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+
+/* Maximum address we can use for the control code buffer */
+
+#define KEXEC_CONTROL_MEMORY_LIMIT (-1UL)
+
+#define KEXEC_CONTROL_PAGE_SIZE4096
+
+#define KEXEC_ARCH KEXEC_ARCH_ARM64
+
+#if !defined(__ASSEMBLY__)
+
+/**
+ * crash_setup_regs() - save registers for the panic kernel
+ *
+ * @newregs: registers are saved here
+ * @oldregs: registers to be saved (may be %NULL)
+ */
+
+static inline void crash_setup_regs(struct pt_regs *newregs,
+   struct pt_regs *oldregs)
+{
+   /* Empty routine needed to avoid build errors. */
+}
+
+#endif /* !defined(__ASSEMBLY__) */
+
+#endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 474691f..f68420d 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -41,6 +41,8 @@ arm64-obj-$(CONFIG_EFI)   += efi.o 
efi-entry.stub.o
 arm64-obj-$(CONFIG_PCI)+= pci.o
 arm64-obj-$(CONFIG_ARMV8_DEPRECATED)   += armv8_deprecated.o
 arm64-obj-$(CONFIG_ACPI)   += acpi.o
+arm64-obj-$(CONFIG_KEXEC)  += machine_kexec.o relocate_kernel.o
\
+  cpu-reset.o
 
 obj-y  += $(arm64-obj-y) vdso/
 obj-m  += $(arm64-obj-m)
diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
new file mode 100644
index 000..8b990b8
--- /dev/null
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -0,0 +1,152 @@
+/*
+ * kexec for arm64
+ *
+ * Copyright (C) Linaro.
+ * Copyright (C) Huawei Futurewei Technologies.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "cpu-reset.h"
+
+/* Global variables for the arm64_relocate_new_kernel routine. */
+extern const unsigned char arm64_relocate_new_kernel[];
+extern const unsigned long arm64_relocate_new_kernel_size;
+
+static unsigned long kimage_start;
+
+void machine_kexec_cleanup(struct kimage *kimage)
+{
+   /* Empty routine needed to avoid build errors. */
+}
+
+/**
+ * machine_kexec_prepare - Prepare for a kexec reboot.
+ *
+ * Called from the core kexec code when a kernel image is loaded.
+ */
+int machine_kexec_prepare(struct kimage *kimage)
+{
+   kimage_start = kimage->start;
+   return 0;
+}
+
+/**
+ * kexec_list_flush - Helper to flush the kimage list to PoC.
+ */
+static void kexec_list_flush(unsigned long kimage_head)
+{
+   unsigned long *entry;
+
+   for (entry = &kimage_head; ; entry++) {
+   

[PATCH v12 04/16] arm64: kvm: allows kvm cpu hotplug

2015-11-24 Thread Geoff Levand
From: AKASHI Takahiro 

The current kvm implementation on arm64 does cpu-specific initialization
at system boot, and has no way to gracefully shutdown a core in terms of
kvm. This prevents, especially, kexec from rebooting the system on a boot
core in EL2.

This patch adds a cpu tear-down function and also puts an existing cpu-init
code into a separate function, kvm_arch_hardware_disable() and
kvm_arch_hardware_enable() respectively.
We don't need arm64-specific cpu hotplug hook any more.

Since this patch modifies common part of code between arm and arm64, one
stub definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
compiling errors.

Signed-off-by: AKASHI Takahiro 
---
 arch/arm/include/asm/kvm_host.h   | 10 -
 arch/arm/include/asm/kvm_mmu.h|  1 +
 arch/arm/kvm/arm.c| 79 ++-
 arch/arm/kvm/mmu.c|  5 +++
 arch/arm64/include/asm/kvm_host.h | 16 +++-
 arch/arm64/include/asm/kvm_mmu.h  |  1 +
 arch/arm64/include/asm/virt.h |  9 +
 arch/arm64/kvm/hyp-init.S | 33 
 arch/arm64/kvm/hyp.S  | 32 ++--
 9 files changed, 138 insertions(+), 48 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6692982..9242765 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -214,6 +214,15 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
boot_pgd_ptr,
kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
 }
 
+static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
+   phys_addr_t phys_idmap_start)
+{
+   /*
+* TODO
+* kvm_call_reset(boot_pgd_ptr, phys_idmap_start);
+*/
+}
+
 static inline int kvm_arch_dev_ioctl_check_extension(long ext)
 {
return 0;
@@ -226,7 +235,6 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
-static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 405aa18..dc6fadf 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -66,6 +66,7 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_mmu_get_boot_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
+phys_addr_t kvm_get_idmap_start(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index eab83b2..a5d9d74 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -16,7 +16,6 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 
-#include 
 #include 
 #include 
 #include 
@@ -61,6 +60,8 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u8 kvm_next_vmid;
 static DEFINE_SPINLOCK(kvm_vmid_lock);
 
+static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
+
 static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
 {
BUG_ON(preemptible());
@@ -85,11 +86,6 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
return &kvm_arm_running_vcpu;
 }
 
-int kvm_arch_hardware_enable(void)
-{
-   return 0;
-}
-
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
@@ -959,7 +955,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
}
 }
 
-static void cpu_init_hyp_mode(void *dummy)
+int kvm_arch_hardware_enable(void)
 {
phys_addr_t boot_pgd_ptr;
phys_addr_t pgd_ptr;
@@ -967,6 +963,9 @@ static void cpu_init_hyp_mode(void *dummy)
unsigned long stack_page;
unsigned long vector_ptr;
 
+   if (__hyp_get_vectors() != hyp_default_vectors)
+   return 0;
+
/* Switch from the HYP stub to our own HYP init vector */
__hyp_set_vectors(kvm_get_idmap_vector());
 
@@ -979,38 +978,50 @@ static void cpu_init_hyp_mode(void *dummy)
__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
 
kvm_arm_init_debug();
+
+   return 0;
 }
 
-static int hyp_init_cpu_notify(struct notifier_block *self,
-  unsigned long action, void *cpu)
+void kvm_arch_hardware_disable(void)
 {
-   switch (action) {
-   case CPU_STARTING:
-   case CPU_STARTING_FROZEN:
-   if (__hyp_get_vectors() == hyp_default_vectors)
-   cpu_init_hyp_mode(NULL);
-   break;
-   }
+   phys_addr_t boot_pgd_ptr;
+   phys_addr_t phys_idmap_start;
 
-   return NOTIFY_OK;
-}
+   if (__hyp_get_vectors() == hyp_default_vectors)
+   return;
 
-static struct noti

[PATCH v12 00/16] arm64 kexec kernel patches v12

2015-11-24 Thread Geoff Levand
Hi All,

This series adds the core support for kexec re-boot and kdump on ARM64.  This
version of the series combines Takahiro's kdump patches with my kexec patches.

Changes since v11:

o No changes, rebase to v4.4-rc2.

Changes since v10:

o Move the flush of the new image from arm64_relocate_new_kernel to 
machine_kexec.
o Pass values to arm64_relocate_new_kernel in registers, not in global 
variables.
o Fixups to setting the sctlr_el1 and sctlr_el2 flags.

To load a second stage kernel and execute a kexec re-boot or to work with kdump
on ARM64 systems a series of patches to kexec-tools [2], which have not yet been
merged upstream, are needed.

I have tested kexec with the ARM Foundation model, and Takahiro has reported
that kdump is working on the 96boards HiKey developer board.  Kexec on EFI
systems works correctly.  More ACPI + kexec testing is needed.

Patch 1 here moves the macros from proc-macros.S to asm/assembler.h so that the
dcache_line_size macro it defines can be uesd by kexec's relocate kernel
routine.

Patches 2 & 3 rework the ARM64 hcall mechanism to give the CPU reset routines
the ability to switch exception levels from EL1 to EL2 for kernels that were
entered in EL2.

Patch 4 allows KVM to handle a CPU reset.

Patches 5 - 7 add back the ARM64 CPU reset support that was recently removed
from the kernel.

Patches 8 - 10 add kexec support.

Patches 11-16 add kdump support.

Please consider all patches for inclusion.

[1]  https://git.kernel.org/cgit/linux/kernel/git/geoff/linux-kexec.git
[2]  https://git.kernel.org/cgit/linux/kernel/git/geoff/kexec-tools.git

-Geoff

The following changes since commit 1ec218373b8ebda821aec00bb156a9c94fad9cd4:

  Linux 4.4-rc2 (2015-11-22 16:45:59 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/geoff/linux-kexec.git kexec-v12

for you to fetch changes up to 83db66df70fc1f7cd4da8b580d264b9320991cf0:

  arm64: kdump: relax BUG_ON() if more than one cpus are still active 
(2015-11-24 13:59:27 -0800)


AKASHI Takahiro (7):
  arm64: kvm: allows kvm cpu hotplug
  arm64: kdump: reserve memory for crash dump kernel
  arm64: kdump: implement machine_crash_shutdown()
  arm64: kdump: add kdump support
  arm64: kdump: update a kernel doc
  arm64: kdump: enable kdump in the arm64 defconfig
  arm64: kdump: relax BUG_ON() if more than one cpus are still active

Geoff Levand (9):
  arm64: Fold proc-macros.S into assembler.h
  arm64: Convert hcalls to use HVC immediate value
  arm64: Add new hcall HVC_CALL_FUNC
  arm64: Add back cpu_reset routines
  Revert "arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function"
  Revert "arm64: remove dead code"
  arm64/kexec: Add core kexec support
  arm64/kexec: Add pr_devel output
  arm64/kexec: Enable kexec in the arm64 defconfig

 Documentation/kdump/kdump.txt|  32 -
 arch/arm/include/asm/kvm_host.h  |  10 +-
 arch/arm/include/asm/kvm_mmu.h   |   1 +
 arch/arm/kvm/arm.c   |  79 ++-
 arch/arm/kvm/mmu.c   |   5 +
 arch/arm64/Kconfig   |  22 
 arch/arm64/configs/defconfig |   2 +
 arch/arm64/include/asm/assembler.h   |  48 ++-
 arch/arm64/include/asm/kexec.h   |  80 +++
 arch/arm64/include/asm/kvm_host.h|  16 ++-
 arch/arm64/include/asm/kvm_mmu.h |   1 +
 arch/arm64/include/asm/mmu.h |   1 +
 arch/arm64/include/asm/mmu_context.h |  35 +++--
 arch/arm64/include/asm/virt.h|  49 +++
 arch/arm64/kernel/Makefile   |   3 +
 arch/arm64/kernel/cpu-reset.S|  84 
 arch/arm64/kernel/cpu-reset.h|  22 
 arch/arm64/kernel/crash_dump.c   |  71 ++
 arch/arm64/kernel/hyp-stub.S |  43 --
 arch/arm64/kernel/machine_kexec.c| 248 +++
 arch/arm64/kernel/relocate_kernel.S  | 131 ++
 arch/arm64/kernel/setup.c|   7 +-
 arch/arm64/kernel/smp.c  |  16 ++-
 arch/arm64/kvm/hyp-init.S|  34 -
 arch/arm64/kvm/hyp.S |  44 +--
 arch/arm64/mm/cache.S|   2 -
 arch/arm64/mm/init.c |  83 
 arch/arm64/mm/mmu.c  |  11 ++
 arch/arm64/mm/proc-macros.S  |  64 -
 arch/arm64/mm/proc.S |   3 -
 include/uapi/linux/kexec.h   |   1 +
 31 files changed, 1097 insertions(+), 151 deletions(-)
 create mode 100644 arch/arm64/include/asm/kexec.h
 create mode 100644 arch/arm64/kernel/cpu-reset.S
 create mode 100644 arch/arm64/kernel/cpu-reset.h
 create mode 100644 arch/arm64/kernel/crash_dump.c
 create mode 100644 arch/arm64/kernel/machine_kexec.c
 create mode 100644 arch/arm64/kernel/relocate_kernel.S
 delete mode 100644 arch/arm64/mm/proc-macros.S

-- 
2.5.0


___
kexec ma

[PATCH v12 03/16] arm64: Add new hcall HVC_CALL_FUNC

2015-11-24 Thread Geoff Levand
Add the new hcall HVC_CALL_FUNC that allows execution of a function at EL2.
During CPU reset the CPU must be brought to the exception level it had on
entry to the kernel.  The HVC_CALL_FUNC hcall will provide the mechanism
needed for this exception level switch.

To allow the HVC_CALL_FUNC exception vector to work without a stack, which is
needed to support an hcall at CPU reset, this implementation uses register x18
to store the link register across the caller provided function.  This dictates
that the caller provided function must preserve the contents of register x18.

Signed-off-by: Geoff Levand 
---
 arch/arm64/include/asm/virt.h | 13 +
 arch/arm64/kernel/hyp-stub.S  | 13 -
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index eb10368..3070096 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -45,6 +45,19 @@
 
 #define HVC_SET_VECTORS 2
 
+/*
+ * HVC_CALL_FUNC - Execute a function at EL2.
+ *
+ * @x0: Physical address of the function to be executed.
+ * @x1: Passed as the first argument to the function.
+ * @x2: Passed as the second argument to the function.
+ * @x3: Passed as the third argument to the function.
+ *
+ * The called function must preserve the contents of register x18.
+ */
+
+#define HVC_CALL_FUNC 3
+
 #define BOOT_CPU_MODE_EL1  (0xe11)
 #define BOOT_CPU_MODE_EL2  (0xe12)
 
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 017ab519..e8febe9 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -67,8 +67,19 @@ el1_sync:
b   2f
 
 1: cmp x18, #HVC_SET_VECTORS
-   b.ne2f
+   b.ne1f
msr vbar_el2, x0
+   b   2f
+
+1: cmp x18, #HVC_CALL_FUNC
+   b.ne2f
+   mov x18, lr
+   mov lr, x0
+   mov x0, x1
+   mov x1, x2
+   mov x2, x3
+   blr lr
+   mov lr, x18
 
 2: eret
 ENDPROC(el1_sync)
-- 
2.5.0



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 07/16] Revert "arm64: remove dead code"

2015-11-24 Thread Geoff Levand
This reverts commit b08d4640a3dca68670fc5af2fe9205b395a02388.

Add back the setup_mm_for_reboot() needed for kexec.

Signed-off-by: Geoff Levand 
---
 arch/arm64/include/asm/mmu.h |  1 +
 arch/arm64/mm/mmu.c  | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 990124a..6326d11 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -29,6 +29,7 @@ typedef struct {
 #define ASID(mm)   ((mm)->context.id.counter & 0x)
 
 extern void paging_init(void);
+extern void setup_mm_for_reboot(void);
 extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
 extern void init_mem_pgprot(void);
 extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index abb66f8..c5bce3c 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -515,6 +515,17 @@ void __init paging_init(void)
 }
 
 /*
+ * Enable the identity mapping to allow the MMU disabling.
+ */
+void setup_mm_for_reboot(void)
+{
+   cpu_set_reserved_ttbr0();
+   flush_tlb_all();
+   cpu_set_idmap_tcr_t0sz();
+   cpu_switch_mm(idmap_pg_dir, &init_mm);
+}
+
+/*
  * Check whether a kernel address is valid (derived from arch/x86/).
  */
 int kern_addr_valid(unsigned long addr)
-- 
2.5.0



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 02/16] arm64: Convert hcalls to use HVC immediate value

2015-11-24 Thread Geoff Levand
The existing arm64 hcall implementations are limited in that they only allow
for two distinct hcalls; with the x0 register either zero or not zero.  Also,
the API of the hyp-stub exception vector routines and the KVM exception vector
routines differ; hyp-stub uses a non-zero value in x0 to implement
__hyp_set_vectors, whereas KVM uses it to implement kvm_call_hyp.

To allow for additional hcalls to be defined and to make the arm64 hcall API
more consistent across exception vector routines, change the hcall
implementations to use the 16 bit immediate value of the HVC instruction to
specify the hcall type.

Define three new preprocessor macros HVC_CALL_HYP, HVC_GET_VECTORS, and
HVC_SET_VECTORS to be used as hcall type specifiers and convert the
existing __hyp_get_vectors(), __hyp_set_vectors() and kvm_call_hyp() routines
to use these new macros when executing an HVC call.  Also, change the
corresponding hyp-stub and KVM el1_sync exception vector routines to use these
new macros.

Signed-off-by: Geoff Levand 
---
 arch/arm64/include/asm/virt.h | 27 +++
 arch/arm64/kernel/hyp-stub.S  | 32 +---
 arch/arm64/kvm/hyp.S  | 16 +---
 3 files changed, 57 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 7a5df52..eb10368 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -18,6 +18,33 @@
 #ifndef __ASM__VIRT_H
 #define __ASM__VIRT_H
 
+/*
+ * The arm64 hcall implementation uses the ISS field of the ESR_EL2 register to
+ * specify the hcall type.  The exception handlers are allowed to use registers
+ * x17 and x18 in their implementation.  Any routine issuing an hcall must not
+ * expect these registers to be preserved.
+ */
+
+/*
+ * HVC_CALL_HYP - Execute a hyp routine.
+ */
+
+#define HVC_CALL_HYP 0
+
+/*
+ * HVC_GET_VECTORS - Return the value of the vbar_el2 register.
+ */
+
+#define HVC_GET_VECTORS 1
+
+/*
+ * HVC_SET_VECTORS - Set the value of the vbar_el2 register.
+ *
+ * @x0: Physical address of the new vector table.
+ */
+
+#define HVC_SET_VECTORS 2
+
 #define BOOT_CPU_MODE_EL1  (0xe11)
 #define BOOT_CPU_MODE_EL2  (0xe12)
 
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index a272f33..017ab519 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -22,6 +22,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
@@ -53,14 +54,22 @@ ENDPROC(__hyp_stub_vectors)
.align 11
 
 el1_sync:
-   mrs x1, esr_el2
-   lsr x1, x1, #26
-   cmp x1, #0x16
+   mrs x18, esr_el2
+   lsr x17, x18, #ESR_ELx_EC_SHIFT
+   and x18, x18, #ESR_ELx_ISS_MASK
+
+   cmp x17, #ESR_ELx_EC_HVC64
b.ne2f  // Not an HVC trap
-   cbz x0, 1f
-   msr vbar_el2, x0// Set vbar_el2
+
+   cmp x18, #HVC_GET_VECTORS
+   b.ne1f
+   mrs x0, vbar_el2
b   2f
-1: mrs x0, vbar_el2// Return vbar_el2
+
+1: cmp x18, #HVC_SET_VECTORS
+   b.ne2f
+   msr vbar_el2, x0
+
 2: eret
 ENDPROC(el1_sync)
 
@@ -100,11 +109,12 @@ ENDPROC(\label)
  * initialisation entry point.
  */
 
-ENTRY(__hyp_get_vectors)
-   mov x0, xzr
-   // fall through
 ENTRY(__hyp_set_vectors)
-   hvc #0
+   hvc #HVC_SET_VECTORS
ret
-ENDPROC(__hyp_get_vectors)
 ENDPROC(__hyp_set_vectors)
+
+ENTRY(__hyp_get_vectors)
+   hvc #HVC_GET_VECTORS
+   ret
+ENDPROC(__hyp_get_vectors)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 1599701..1bef8db 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CPU_GP_REG_OFFSET(x)   (CPU_GP_REGS + x)
 #define CPU_XREG_OFFSET(x) CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
@@ -932,12 +933,9 @@ __hyp_panic_str:
  * in Hyp mode (see init_hyp_mode in arch/arm/kvm/arm.c).  Return values are
  * passed in r0 and r1.
  *
- * A function pointer with a value of 0 has a special meaning, and is
- * used to implement __hyp_get_vectors in the same way as in
- * arch/arm64/kernel/hyp_stub.S.
  */
 ENTRY(kvm_call_hyp)
-   hvc #0
+   hvc #HVC_CALL_HYP
ret
 ENDPROC(kvm_call_hyp)
 
@@ -968,6 +966,7 @@ el1_sync:   // Guest 
trapped into EL2
 
mrs x1, esr_el2
lsr x2, x1, #ESR_ELx_EC_SHIFT
+   and x0, x1, #ESR_ELx_ISS_MASK
 
cmp x2, #ESR_ELx_EC_HVC64
b.neel1_trap
@@ -976,15 +975,18 @@ el1_sync: // Guest 
trapped into EL2
cbnzx3, el1_trap// called HVC
 
/* Here, we're pretty sure the host called HVC. */
+   mov x18, x0
pop x2, x3
pop x0, x1
 
-   /* Check for __hyp_get_vectors */

[PATCH v12 09/16] arm64/kexec: Add pr_devel output

2015-11-24 Thread Geoff Levand
To aid in debugging kexec problems or when adding new functionality to kexec add
a new routine kexec_image_info() and several inline pr_devel statements.

Signed-off-by: Geoff Levand 
---
 arch/arm64/kernel/machine_kexec.c | 63 +++
 1 file changed, 63 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index 8b990b8..da28a26 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -25,6 +25,48 @@ extern const unsigned long arm64_relocate_new_kernel_size;
 
 static unsigned long kimage_start;
 
+/**
+ * kexec_is_dtb - Helper routine to check the device tree header signature.
+ */
+static bool kexec_is_dtb(const void *dtb)
+{
+   __be32 magic;
+
+   return get_user(magic, (__be32 *)dtb) ? false :
+   (be32_to_cpu(magic) == OF_DT_HEADER);
+}
+
+/**
+ * kexec_image_info - For debugging output.
+ */
+#define kexec_image_info(_i) _kexec_image_info(__func__, __LINE__, _i)
+static void _kexec_image_info(const char *func, int line,
+   const struct kimage *kimage)
+{
+   unsigned long i;
+
+#if !defined(DEBUG)
+   return;
+#endif
+   pr_devel("%s:%d:\n", func, line);
+   pr_devel("  kexec kimage info:\n");
+   pr_devel("type:%d\n", kimage->type);
+   pr_devel("start:   %lx\n", kimage->start);
+   pr_devel("head:%lx\n", kimage->head);
+   pr_devel("nr_segments: %lu\n", kimage->nr_segments);
+
+   for (i = 0; i < kimage->nr_segments; i++) {
+   pr_devel("  segment[%lu]: %016lx - %016lx, %lx bytes, %lu 
pages%s\n",
+   i,
+   kimage->segment[i].mem,
+   kimage->segment[i].mem + kimage->segment[i].memsz,
+   kimage->segment[i].memsz,
+   kimage->segment[i].memsz /  PAGE_SIZE,
+   (kexec_is_dtb(kimage->segment[i].buf) ?
+   ", dtb segment" : ""));
+   }
+}
+
 void machine_kexec_cleanup(struct kimage *kimage)
 {
/* Empty routine needed to avoid build errors. */
@@ -38,6 +80,8 @@ void machine_kexec_cleanup(struct kimage *kimage)
 int machine_kexec_prepare(struct kimage *kimage)
 {
kimage_start = kimage->start;
+   kexec_image_info(kimage);
+
return 0;
 }
 
@@ -107,6 +151,25 @@ void machine_kexec(struct kimage *kimage)
reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
 
+   kexec_image_info(kimage);
+
+   pr_devel("%s:%d: control_code_page:%p\n", __func__, __LINE__,
+   kimage->control_code_page);
+   pr_devel("%s:%d: reboot_code_buffer_phys:  %pa\n", __func__, __LINE__,
+   &reboot_code_buffer_phys);
+   pr_devel("%s:%d: reboot_code_buffer:   %p\n", __func__, __LINE__,
+   reboot_code_buffer);
+   pr_devel("%s:%d: relocate_new_kernel:  %p\n", __func__, __LINE__,
+   arm64_relocate_new_kernel);
+   pr_devel("%s:%d: relocate_new_kernel_size: 0x%lx(%lu) bytes\n",
+   __func__, __LINE__, arm64_relocate_new_kernel_size,
+   arm64_relocate_new_kernel_size);
+
+   pr_devel("%s:%d: kimage_head:  %lx\n", __func__, __LINE__,
+   kimage->head);
+   pr_devel("%s:%d: kimage_start: %lx\n", __func__, __LINE__,
+   kimage_start);
+
/*
 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
 * after the kernel is shut down.
-- 
2.5.0



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v12 01/16] arm64: Fold proc-macros.S into assembler.h

2015-11-24 Thread Geoff Levand
To allow the assembler macros defined in arch/arm64/mm/proc-macros.S to be used
outside the mm code move the contents of proc-macros.S to asm/assembler.h.  
Also,
delete proc-macros.S, and fix up all references to proc-macros.S.

Signed-off-by: Geoff Levand 
---
 arch/arm64/include/asm/assembler.h | 48 +++-
 arch/arm64/kvm/hyp-init.S  |  1 -
 arch/arm64/mm/cache.S  |  2 --
 arch/arm64/mm/proc-macros.S| 64 --
 arch/arm64/mm/proc.S   |  3 --
 5 files changed, 47 insertions(+), 71 deletions(-)
 delete mode 100644 arch/arm64/mm/proc-macros.S

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index 12eff92..21979a4 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -1,5 +1,5 @@
 /*
- * Based on arch/arm/include/asm/assembler.h
+ * Based on arch/arm/include/asm/assembler.h, arch/arm/mm/proc-macros.S
  *
  * Copyright (C) 1996-2000 Russell King
  * Copyright (C) 2012 ARM Ltd.
@@ -23,6 +23,8 @@
 #ifndef __ASM_ASSEMBLER_H
 #define __ASM_ASSEMBLER_H
 
+#include 
+#include 
 #include 
 #include 
 
@@ -194,6 +196,50 @@ lr .reqx30 // link register
.endm
 
 /*
+ * vma_vm_mm - get mm pointer from vma pointer (vma->vm_mm)
+ */
+   .macro  vma_vm_mm, rd, rn
+   ldr \rd, [\rn, #VMA_VM_MM]
+   .endm
+
+/*
+ * mmid - get context id from mm pointer (mm->context.id)
+ */
+   .macro  mmid, rd, rn
+   ldr \rd, [\rn, #MM_CONTEXT_ID]
+   .endm
+
+/*
+ * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ */
+   .macro  dcache_line_size, reg, tmp
+   mrs \tmp, ctr_el0   // read CTR
+   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
+   .endm
+
+/*
+ * icache_line_size - get the minimum I-cache line size from the CTR register.
+ */
+   .macro  icache_line_size, reg, tmp
+   mrs \tmp, ctr_el0   // read CTR
+   and \tmp, \tmp, #0xf// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
+   .endm
+
+/*
+ * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
+ */
+   .macro  tcr_set_idmap_t0sz, valreg, tmpreg
+#ifndef CONFIG_ARM64_VA_BITS_48
+   ldr_l   \tmpreg, idmap_t0sz
+   bfi \valreg, \tmpreg, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH
+#endif
+   .endm
+
+/*
  * Annotate a function as position independent, i.e., safe to be called before
  * the kernel virtual mapping is activated.
  */
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index 178ba22..2e67a48 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -20,7 +20,6 @@
 #include 
 #include 
 #include 
-#include 
 
.text
.pushsection.hyp.idmap.text, "ax"
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index cfa44a6..f49041d 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -24,8 +24,6 @@
 #include 
 #include 
 
-#include "proc-macros.S"
-
 /*
  * flush_icache_range(start,end)
  *
diff --git a/arch/arm64/mm/proc-macros.S b/arch/arm64/mm/proc-macros.S
deleted file mode 100644
index 4c4d93c..000
--- a/arch/arm64/mm/proc-macros.S
+++ /dev/null
@@ -1,64 +0,0 @@
-/*
- * Based on arch/arm/mm/proc-macros.S
- *
- * Copyright (C) 2012 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
- */
-
-#include 
-#include 
-
-/*
- * vma_vm_mm - get mm pointer from vma pointer (vma->vm_mm)
- */
-   .macro  vma_vm_mm, rd, rn
-   ldr \rd, [\rn, #VMA_VM_MM]
-   .endm
-
-/*
- * mmid - get context id from mm pointer (mm->context.id)
- */
-   .macro  mmid, rd, rn
-   ldr \rd, [\rn, #MM_CONTEXT_ID]
-   .endm
-
-/*
- * dcache_line_size - get the minimum D-cache line size from the CTR register.
- */
-   .macro  dcache_line_size, reg, tmp
-   mrs \tmp, ctr_el0   // read CTR
-   ubfm\tmp, \tmp, #16, #19// cache line size encoding
-   mov \reg, #4// bytes per word
-   lsl \reg, \reg, \tmp// actual cache 

Re: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-11-24 Thread Steven Rostedt
On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote:
> Currently, panic() and crash_kexec() can be called at the same time.
> For example (x86 case):
> 
> CPU 0:
>   oops_end()
> crash_kexec()
>   mutex_trylock() // acquired
> nmi_shootdown_cpus() // stop other cpus
> 
> CPU 1:
>   panic()
> crash_kexec()
>   mutex_trylock() // failed to acquire
> smp_send_stop() // stop other cpus
> infinite loop
> 
> If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
> fails.

So the smp_send_stop() stops CPU 0 from calling nmi_shootdown_cpus(), right?

> 
> In another case:
> 
> CPU 0:
>   oops_end()
> crash_kexec()
>   mutex_trylock() // acquired
> 
> io_check_error()
>   panic()
> crash_kexec()
>   mutex_trylock() // failed to acquire
> infinite loop
> 
> Clearly, this is an undesirable result.

I'm trying to see how this patch fixes this case.

> 
> To fix this problem, this patch changes crash_kexec() to exclude
> others by using atomic_t panic_cpu.
> 
> V5:
> - Add missing dummy __crash_kexec() for !CONFIG_KEXEC_CORE case
> - Replace atomic_xchg() with atomic_set() in crash_kexec() because
>   it is used as a release operation and there is no need of memory
>   barrier effect.  This change also removes an unused value warning
> 
> V4:
> - Use new __crash_kexec(), no exclusion check version of crash_kexec(),
>   instead of checking if panic_cpu is the current cpu or not
> 
> V2:
> - Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
>   to exclude concurrent accesses
> - Don't introduce no-lock version of crash_kexec()
> 
> Signed-off-by: Hidehiro Kawai 
> Cc: Eric Biederman 
> Cc: Vivek Goyal 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> ---
>  include/linux/kexec.h |2 ++
>  kernel/kexec_core.c   |   26 +-
>  kernel/panic.c|4 ++--
>  3 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index d140b1e..7b68d27 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -237,6 +237,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage 
> *image,
> unsigned int size, bool get_value);
>  extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
>const char *name);
> +extern void __crash_kexec(struct pt_regs *);
>  extern void crash_kexec(struct pt_regs *);
>  int kexec_should_crash(struct task_struct *);
>  void crash_save_cpu(struct pt_regs *regs, int cpu);
> @@ -332,6 +333,7 @@ int __weak arch_kexec_apply_relocations(const Elf_Ehdr 
> *ehdr, Elf_Shdr *sechdrs,
>  #else /* !CONFIG_KEXEC_CORE */
>  struct pt_regs;
>  struct task_struct;
> +static inline void __crash_kexec(struct pt_regs *regs) { }
>  static inline void crash_kexec(struct pt_regs *regs) { }
>  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
>  #define kexec_in_progress false
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 11b64a6..9d097f5 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -853,7 +853,8 @@ struct kimage *kexec_image;
>  struct kimage *kexec_crash_image;
>  int kexec_load_disabled;
>  
> -void crash_kexec(struct pt_regs *regs)
> +/* No panic_cpu check version of crash_kexec */
> +void __crash_kexec(struct pt_regs *regs)
>  {
>   /* Take the kexec_mutex here to prevent sys_kexec_load
>* running on one cpu from replacing the crash kernel
> @@ -876,6 +877,29 @@ void crash_kexec(struct pt_regs *regs)
>   }
>  }
>  
> +void crash_kexec(struct pt_regs *regs)
> +{
> + int old_cpu, this_cpu;
> +
> + /*
> +  * Only one CPU is allowed to execute the crash_kexec() code as with
> +  * panic().  Otherwise parallel calls of panic() and crash_kexec()
> +  * may stop each other.  To exclude them, we use panic_cpu here too.
> +  */
> + this_cpu = raw_smp_processor_id();
> + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> + if (old_cpu == -1) {
> + /* This is the 1st CPU which comes here, so go ahead. */
> + __crash_kexec(regs);
> +
> + /*
> +  * Reset panic_cpu to allow another panic()/crash_kexec()
> +  * call.
> +  */
> + atomic_set(&panic_cpu, -1);
> + }
> +}
> +
>  size_t crash_get_memory_size(void)
>  {
>   size_t size = 0;
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 4fce2be..5d0b807 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -138,7 +138,7 @@ void panic(const char *fmt, ...)
>* the "crash_kexec_post_notifiers" option to the kernel.
>*/
>   if (!crash_kexec_post_notifiers)
> - crash_kexec(NULL);
> + __crash_kexec(NULL);

Why call the __crash_kexec() version and not just crash_kexec() here.
This needs to be documented.

>  
>   /*
>

Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-24 Thread Steven Rostedt
On Tue, 24 Nov 2015 21:27:13 +0100
Michal Hocko  wrote:


> > What happens if:
> > 
> > CPU 0:  CPU 1:
> > --  --
> > nmi_panic();
> > 
> > nmi_panic();
> > 
> > nmi_panic();  
> 
> I thought that nmi_panic is called only from the nmi context. If so how
> can we get a nested NMI like that?

Nevermind. I was thinking the external NMI could nest, but I'm guessing
it cant. Anyway, the patches later on modify this code which checks for
something other than != this_cpu, which makes this issue mute even if
it could nest.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-24 Thread Michal Hocko
On Tue 24-11-15 10:05:10, Steven Rostedt wrote:
> On Fri, Nov 20, 2015 at 06:36:44PM +0900, Hidehiro Kawai wrote:
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 350dfb0..480a4fd 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -445,6 +445,19 @@ extern int sysctl_panic_on_stackoverflow;
> >  
> >  extern bool crash_kexec_post_notifiers;
> >  
> > +extern atomic_t panic_cpu;
> > +
> > +/*
> > + * A variant of panic() called from NMI context.
> > + * If we've already panicked on this cpu, return from here.
> > + */
> > +#define nmi_panic(fmt, ...)
> > \
> > +   do {\
> > +   int this_cpu = raw_smp_processor_id();  \
> > +   if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \
> > +   panic(fmt, ##__VA_ARGS__);  \
> 
> Hmm,
> 
> What happens if:
> 
>   CPU 0:  CPU 1:
>   --  --
>   nmi_panic();
> 
>   nmi_panic();
>   
>   nmi_panic();

I thought that nmi_panic is called only from the nmi context. If so how
can we get a nested NMI like that?
-- 
Michal Hocko
SUSE Labs

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread Borislav Petkov
On Tue, Nov 24, 2015 at 02:37:00PM -0500, Steven Rostedt wrote:
> On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> > 
> > > +  */
> > > + while (!raw_spin_trylock(&nmi_reason_lock))
> > > + poll_crash_ipi_and_callback(regs);
> > 
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
> 
> This only does something if crash_ipi_done is set, which means you are killing
> the box.

Yeah, Michal and I discussed that on IRC today. And yeah, it is really
tricky stuff. So I appreciate it a lot you looking at it too. Thanks!

> But perhaps a comment that states that here would be useful, or maybe
> just put in the check here. There's no need to make it depend on SMP, as
> raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
> hit.

Right, this code needs much more thorough documentation to counter the
trickiness.

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread Steven Rostedt
On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> 
> > +*/
> > +   while (!raw_spin_trylock(&nmi_reason_lock))
> > +   poll_crash_ipi_and_callback(regs);
> 
> Waaait a minute: so if we're getting NMIs broadcasted on every core but
> we're *not* crash dumping, we will run into here too. This can't be
> right. :-\

This only does something if crash_ipi_done is set, which means you are killing
the box. But perhaps a comment that states that here would be useful, or maybe
just put in the check here. There's no need to make it depend on SMP, as
raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
hit.

-- Steve


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-24 Thread Steven Rostedt
On Fri, Nov 20, 2015 at 06:36:44PM +0900, Hidehiro Kawai wrote:
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 350dfb0..480a4fd 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -445,6 +445,19 @@ extern int sysctl_panic_on_stackoverflow;
>  
>  extern bool crash_kexec_post_notifiers;
>  
> +extern atomic_t panic_cpu;
> +
> +/*
> + * A variant of panic() called from NMI context.
> + * If we've already panicked on this cpu, return from here.
> + */
> +#define nmi_panic(fmt, ...)  \
> + do {\
> + int this_cpu = raw_smp_processor_id();  \
> + if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \
> + panic(fmt, ##__VA_ARGS__);  \

Hmm,

What happens if:

CPU 0:  CPU 1:
--  --
nmi_panic();

nmi_panic();

nmi_panic();

?

cmpxchg(&panic_cpu, -1, 0) != 0

returns -1 for cpu 0, thus 0 != 0, and sets panic_cpu to 0


cmpxchg(&panic_cpu, -1, 1) != 1

returns 0, and then it too panics, but does not set panic_cpu to 1

Now you have your external NMI triggering on CPU 1

cmpxchg(&panic_cpu, -1, 1) != 1

returns 0 again, and you call panic again within the panic of CPU 1.

Is this OK?

Perhaps you want a per cpu bitmask, and do a test_and_set() on the CPU. That
would prevent any CPU from rerunning a panic() twice on any CPU.

-- Steve

> + } while (0)
> +
>  /*
>   * Only to be used by arch init code. If the user over-wrote the default
>   * CONFIG_PANIC_TIMEOUT, honor it.
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 4579dbb..24ee2ea 100644

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-24 Thread Steven Rostedt
On Tue, 24 Nov 2015 10:05:10 -0500
Steven Rostedt  wrote:

> cmpxchg(&panic_cpu, -1, 0) != 0
> 
> returns -1 for cpu 0, thus 0 != 0, and sets panic_cpu to 0

That was suppose to be "thus -1 != 0".

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-11-24 Thread Michal Hocko
On Fri 20-11-15 18:36:48, Hidehiro Kawai wrote:
> Currently, panic() and crash_kexec() can be called at the same time.
> For example (x86 case):
> 
> CPU 0:
>   oops_end()
> crash_kexec()
>   mutex_trylock() // acquired
> nmi_shootdown_cpus() // stop other cpus
> 
> CPU 1:
>   panic()
> crash_kexec()
>   mutex_trylock() // failed to acquire
> smp_send_stop() // stop other cpus
> infinite loop
> 
> If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
> fails.
> 
> In another case:
> 
> CPU 0:
>   oops_end()
> crash_kexec()
>   mutex_trylock() // acquired
> 
> io_check_error()
>   panic()
> crash_kexec()
>   mutex_trylock() // failed to acquire
> infinite loop
> 
> Clearly, this is an undesirable result.
> 
> To fix this problem, this patch changes crash_kexec() to exclude
> others by using atomic_t panic_cpu.
> 
> V5:
> - Add missing dummy __crash_kexec() for !CONFIG_KEXEC_CORE case
> - Replace atomic_xchg() with atomic_set() in crash_kexec() because
>   it is used as a release operation and there is no need of memory
>   barrier effect.  This change also removes an unused value warning
> 
> V4:
> - Use new __crash_kexec(), no exclusion check version of crash_kexec(),
>   instead of checking if panic_cpu is the current cpu or not
> 
> V2:
> - Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
>   to exclude concurrent accesses
> - Don't introduce no-lock version of crash_kexec()
> 
> Signed-off-by: Hidehiro Kawai 
> Cc: Eric Biederman 
> Cc: Vivek Goyal 
> Cc: Andrew Morton 
> Cc: Michal Hocko 

Looks good to me as well
Acked-by: Michal Hocko 

[...]
> +void crash_kexec(struct pt_regs *regs)
> +{
> + int old_cpu, this_cpu;
> +
> + /*
> +  * Only one CPU is allowed to execute the crash_kexec() code as with
> +  * panic().  Otherwise parallel calls of panic() and crash_kexec()
> +  * may stop each other.  To exclude them, we use panic_cpu here too.
> +  */
> + this_cpu = raw_smp_processor_id();
> + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> + if (old_cpu == -1) {
> + /* This is the 1st CPU which comes here, so go ahead. */
> + __crash_kexec(regs);
> +
> + /*
> +  * Reset panic_cpu to allow another panic()/crash_kexec()
> +  * call.
> +  */
> + atomic_set(&panic_cpu, -1);

This was slighly more obvious in the previous version where the reset
happened after the trylock on the mutex failed, maybe the comment could
be more specific
+   /*
+* Reset panic_cpu to allow another panic()/crash_kexec()
+* call if __crash_kexec couldn't handle the situation.
+*/
-- 
Michal Hocko
SUSE Labs

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread Michal Hocko
On Fri 20-11-15 18:36:46, Hidehiro Kawai wrote:
> nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI
> to non-panic cpus to stop them while saving their register
> information and doing some cleanups for crash dumping.  So if a
> non-panic cpus is infinitely looping in NMI context, we fail to
> save its register information and lose the information from the
> crash dump.
> 
> `Infinite loop in NMI context' can happen:
> 
>   a. when a cpu panics on NMI while another cpu is processing panic
>   b. when a cpu received an external or unknown NMI while another
>  cpu is processing panic on NMI
> 
> In the case of a, it loops in panic_smp_self_stop().  In the case
> of b, it loops in raw_spin_lock() of nmi_reason_lock.  This can
> happen on some servers which broadcasts NMIs to all CPUs when a dump
> button is pushed.
> 
> To save registers in these case too, this patch does following things:
> 
> 1. Move the timing of `infinite loop in NMI context' (actually
>done by panic_smp_self_stop()) outside of panic() to enable us to
>refer pt_regs
> 2. call a callback of nmi_shootdown_cpus() directly to save
>registers and do some cleanups after setting waiting_for_crash_ipi
>which is used for counting down the number of cpus which handled
>the callback
> 
> V5:
> - Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the
>   compiler doesn't change the instruction order
> - Support the case of b in the above description
> - Add poll_crash_ipi_and_callback()
> 
> V4:
> - Rewrite the patch description
> 
> V3:
> - Newly introduced
> 
> Signed-off-by: Hidehiro Kawai 
> Cc: Andrew Morton 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Peter Zijlstra 
> Cc: Eric Biederman 
> Cc: Vivek Goyal 
> Cc: Michal Hocko 

Yes this seems correct
Acked-by: Michal Hocko 

> ---
>  arch/x86/include/asm/reboot.h |1 +
>  arch/x86/kernel/nmi.c |   17 +
>  arch/x86/kernel/reboot.c  |   28 
>  include/linux/kernel.h|   12 ++--
>  kernel/panic.c|   10 ++
>  kernel/watchdog.c |2 +-
>  6 files changed, 63 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> index a82c4f1..964e82f 100644
> --- a/arch/x86/include/asm/reboot.h
> +++ b/arch/x86/include/asm/reboot.h
> @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
>  
>  typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
>  void nmi_shootdown_cpus(nmi_shootdown_cb callback);
> +void poll_crash_ipi_and_callback(struct pt_regs *regs);
>  
>  #endif /* _ASM_X86_REBOOT_H */
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 5131714..74a1434 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define CREATE_TRACE_POINTS
>  #include 
> @@ -231,7 +232,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
>  #endif
>  
>   if (panic_on_unrecovered_nmi)
> - nmi_panic("NMI: Not continuing");
> + nmi_panic(regs, "NMI: Not continuing");
>  
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  
> @@ -256,7 +257,7 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
>   show_regs(regs);
>  
>   if (panic_on_io_nmi) {
> - nmi_panic("NMI IOCK error: Not continuing");
> + nmi_panic(regs, "NMI IOCK error: Not continuing");
>  
>   /*
>* If we return from nmi_panic(), it means we have received
> @@ -305,7 +306,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs 
> *regs)
>  
>   pr_emerg("Do you have a strange power saving mode enabled?\n");
>   if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> - nmi_panic("NMI: Not continuing");
> + nmi_panic(regs, "NMI: Not continuing");
>  
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> @@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
>   }
>  
>   /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> - raw_spin_lock(&nmi_reason_lock);
> +
> + /*
> +  * Another CPU may be processing panic routines with holding
> +  * nmi_reason_lock.  Check IPI issuance from the panicking CPU
> +  * and call the callback directly.
> +  */
> + while (!raw_spin_trylock(&nmi_reason_lock))
> + poll_crash_ipi_and_callback(regs);
> +
>   reason = x86_platform.get_nmi_reason();
>  
>   if (reason & NMI_REASON_MASK) {
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index 02693dd..44c5f5b 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -718,6 +718,7 @@ static int crashing_cpu;
>  static nmi_shootdown_cb shootdown_callback;
>  
>  static atomic_t waiting_for_crash_ipi;
> +static int crash_ipi_don

Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-24 Thread Michal Hocko
On Fri 20-11-15 18:36:44, Hidehiro Kawai wrote:
> If panic on NMI happens just after panic() on the same CPU, panic()
> is recursively called.  As the result, it stalls after failing to
> acquire panic_lock.
> 
> To avoid this problem, don't call panic() in NMI context if
> we've already entered panic().
> 
> V4:
> - Improve comments in io_check_error() and panic()
> 
> V3:
> - Introduce nmi_panic() macro to reduce code duplication
> - In the case of panic on NMI, don't return from NMI handlers
>   if another cpu already panicked
> 
> V2:
> - Use atomic_cmpxchg() instead of current spin_trylock() to
>   exclude concurrent accesses to the panic routines
> - Don't introduce no-lock version of panic()
> 
> Signed-off-by: Hidehiro Kawai 
> Cc: Andrew Morton 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Peter Zijlstra 
> Cc: Michal Hocko 

I've finally seen testing results for these patches and managed to look
at them again.

Acked-by: Michal Hocko 

> ---
>  arch/x86/kernel/nmi.c  |   16 
>  include/linux/kernel.h |   13 +
>  kernel/panic.c |   15 ---
>  kernel/watchdog.c  |2 +-
>  4 files changed, 38 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 697f90d..5131714 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
>  #endif
>  
>   if (panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
>  
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  
> @@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs 
> *regs)
>reason, smp_processor_id());
>   show_regs(regs);
>  
> - if (panic_on_io_nmi)
> - panic("NMI IOCK error: Not continuing");
> + if (panic_on_io_nmi) {
> + nmi_panic("NMI IOCK error: Not continuing");
> +
> + /*
> +  * If we return from nmi_panic(), it means we have received
> +  * NMI while processing panic().  So, simply return without
> +  * a delay and re-enabling NMI.
> +  */
> + return;
> + }
>  
>   /* Re-enable the IOCK line, wait for a few seconds */
>   reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> @@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs 
> *regs)
>  
>   pr_emerg("Do you have a strange power saving mode enabled?\n");
>   if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
>  
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 350dfb0..480a4fd 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -445,6 +445,19 @@ extern int sysctl_panic_on_stackoverflow;
>  
>  extern bool crash_kexec_post_notifiers;
>  
> +extern atomic_t panic_cpu;
> +
> +/*
> + * A variant of panic() called from NMI context.
> + * If we've already panicked on this cpu, return from here.
> + */
> +#define nmi_panic(fmt, ...)  \
> + do {\
> + int this_cpu = raw_smp_processor_id();  \
> + if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \
> + panic(fmt, ##__VA_ARGS__);  \
> + } while (0)
> +
>  /*
>   * Only to be used by arch init code. If the user over-wrote the default
>   * CONFIG_PANIC_TIMEOUT, honor it.
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 4579dbb..24ee2ea 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -61,6 +61,8 @@ void __weak panic_smp_self_stop(void)
>   cpu_relax();
>  }
>  
> +atomic_t panic_cpu = ATOMIC_INIT(-1);
> +
>  /**
>   *   panic - halt the system
>   *   @fmt: The text string to print
> @@ -71,17 +73,17 @@ void __weak panic_smp_self_stop(void)
>   */
>  void panic(const char *fmt, ...)
>  {
> - static DEFINE_SPINLOCK(panic_lock);
>   static char buf[1024];
>   va_list args;
>   long i, i_next = 0;
>   int state = 0;
> + int old_cpu, this_cpu;
>  
>   /*
>* Disable local interrupts. This will prevent panic_smp_self_stop
>* from deadlocking the first cpu that invokes the panic, since
>* there is nothing to prevent an interrupt handler (that runs
> -  * after the panic_lock is acquired) from invoking panic again.
> +  * after setting panic_cpu) from invoking panic again.
>*/
>   local_irq_disable();
>  
> @@ -94,8 +96,15 @@ void panic(const char *fmt, ...)
>* multiple parallel invocations of panic, all other CPUs either
>* stop themself or will wait un

Re: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread Borislav Petkov
On Fri, Nov 20, 2015 at 06:36:46PM +0900, Hidehiro Kawai wrote:
> nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI
> to non-panic cpus to stop them while saving their register

 ...to stop them and save their register...

> information and doing some cleanups for crash dumping.  So if a
> non-panic cpus is infinitely looping in NMI context, we fail to

That should be CPU. Please use "CPU" instead of "cpu" in all your text
in your next submission.

> save its register information and lose the information from the
> crash dump.
>
> `Infinite loop in NMI context' can happen:
> 
>   a. when a cpu panics on NMI while another cpu is processing panic
>   b. when a cpu received an external or unknown NMI while another
>  cpu is processing panic on NMI
> 
> In the case of a, it loops in panic_smp_self_stop().  In the case
> of b, it loops in raw_spin_lock() of nmi_reason_lock.

Please describe those two cases more verbosely - it takes slow people
like me a while to figure out what exactly can happen.

> This can
> happen on some servers which broadcasts NMIs to all CPUs when a dump
> button is pushed.
> 
> To save registers in these case too, this patch does following things:
> 
> 1. Move the timing of `infinite loop in NMI context' (actually
>done by panic_smp_self_stop()) outside of panic() to enable us to
>refer pt_regs

I can't parse that sentence. And I really tried :-\
panic_smp_self_stop() is still in panic().

> 2. call a callback of nmi_shootdown_cpus() directly to save
>registers and do some cleanups after setting waiting_for_crash_ipi
>which is used for counting down the number of cpus which handled
>the callback
> 
> V5:
> - Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the
>   compiler doesn't change the instruction order
> - Support the case of b in the above description
> - Add poll_crash_ipi_and_callback()
> 
> V4:
> - Rewrite the patch description
> 
> V3:
> - Newly introduced
> 
> Signed-off-by: Hidehiro Kawai 
> Cc: Andrew Morton 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Peter Zijlstra 
> Cc: Eric Biederman 
> Cc: Vivek Goyal 
> Cc: Michal Hocko 
> ---
>  arch/x86/include/asm/reboot.h |1 +
>  arch/x86/kernel/nmi.c |   17 +
>  arch/x86/kernel/reboot.c  |   28 
>  include/linux/kernel.h|   12 ++--
>  kernel/panic.c|   10 ++
>  kernel/watchdog.c |2 +-
>  6 files changed, 63 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> index a82c4f1..964e82f 100644
> --- a/arch/x86/include/asm/reboot.h
> +++ b/arch/x86/include/asm/reboot.h
> @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
>  
>  typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
>  void nmi_shootdown_cpus(nmi_shootdown_cb callback);
> +void poll_crash_ipi_and_callback(struct pt_regs *regs);
>  
>  #endif /* _ASM_X86_REBOOT_H */
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 5131714..74a1434 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define CREATE_TRACE_POINTS
>  #include 
> @@ -231,7 +232,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
>  #endif
>  
>   if (panic_on_unrecovered_nmi)
> - nmi_panic("NMI: Not continuing");
> + nmi_panic(regs, "NMI: Not continuing");
>  
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  
> @@ -256,7 +257,7 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
>   show_regs(regs);
>  
>   if (panic_on_io_nmi) {
> - nmi_panic("NMI IOCK error: Not continuing");
> + nmi_panic(regs, "NMI IOCK error: Not continuing");
>  
>   /*
>* If we return from nmi_panic(), it means we have received
> @@ -305,7 +306,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs 
> *regs)
>  
>   pr_emerg("Do you have a strange power saving mode enabled?\n");
>   if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> - nmi_panic("NMI: Not continuing");
> + nmi_panic(regs, "NMI: Not continuing");
>  
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> @@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
>   }
>  
>   /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> - raw_spin_lock(&nmi_reason_lock);
> +
> + /*
> +  * Another CPU may be processing panic routines with holding

while

> +  * nmi_reason_lock.  Check IPI issuance from the panicking CPU
> +  * and call the callback directly.

This is one strange sentence. Does it mean something like:

"Check if the panicking CPU issued the IPI and, if so, call the cra