Re: [PATCH] powerpc/kdump: fix kdump kernel hangup issue with hot add CPUs

2021-04-16 Thread Sourabh Jain



On 16/04/21 3:03 pm, Hari Bathini wrote:



On 16/04/21 12:17 pm, Sourabh Jain wrote:

With the kexec_file_load system call when system crashes on the hot add
CPU the capture kernel hangs and failed to collect the vmcore.

  Kernel panic - not syncing: sysrq triggered crash
  CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 
5.12.0-rc5upstream #54

  Call Trace:
  [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 
(unreliable)

  [c000e590fb00] [c0145290] panic+0x16c/0x41c
  [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40
  [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0
  [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178
  [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0
  [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330
  [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140
  [c000e590fdb0] [c0031ee0] 
system_call_exception+0x150/0x290

  [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278
  --- interrupt: c00 at 0x7fff905b9664
  NIP:  7fff905b9664 LR: 7fff905320c4 CTR: 
  REGS: c000e590fe80 TRAP: 0c00   Not tainted (5.12.0-rc5upstream)
  MSR:  8280f033   CR: 
28000242

    XER: 
  IRQMASK: 0
  GPR00: 0004 75fedf30 7fff906a7300 
0001
  GPR04: 01002a7355b0 0002 0001 
75fef616
  GPR08: 0001   

  GPR12:  7fff9073a160  

  GPR16:    

  GPR20:  7fff906a4ee0 0002 
0001
  GPR24: 7fff906a0898  0002 
01002a7355b0
  GPR28: 0002 7fff906a1790 01002a7355b0 
0002

  NIP [7fff905b9664] 0x7fff905b9664
  LR [7fff905320c4] 0x7fff905320c4
  --- interrupt: c00




I will update the commit message.


  /**
   * setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel
   *   being loaded.
@@ -1020,6 +1113,13 @@ int setup_new_fdt_ppc64(const struct kimage 
*image, void *fdt,

  }
  }
  +    /* Update cpus nodes information to account hotplug CPUs. */
+    if (image->type == KEXEC_TYPE_CRASH) {


Shouldn't this apply to regular kexec_file_load case as well? Yeah, 
there won't be a hang in regular kexec_file_load case but for 
correctness, that kernel should also not see stale CPU info in FDT?


Yes better to update the fdt for both kexec and kdump.

Thanks for the review Hari.

- Sourabh Jain


[PATCH] powerpc/kdump: fix kdump kernel hangup issue with hot add CPUs

2021-04-16 Thread Sourabh Jain
With the kexec_file_load system call when system crashes on the hot add
CPU the capture kernel hangs and failed to collect the vmcore.

 Kernel panic - not syncing: sysrq triggered crash
 CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54
 Call Trace:
 [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable)
 [c000e590fb00] [c0145290] panic+0x16c/0x41c
 [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40
 [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0
 [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178
 [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0
 [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330
 [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140
 [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290
 [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278
 --- interrupt: c00 at 0x7fff905b9664
 NIP:  7fff905b9664 LR: 7fff905320c4 CTR: 
 REGS: c000e590fe80 TRAP: 0c00   Not tainted  (5.12.0-rc5upstream)
 MSR:  8280f033   CR: 28000242
   XER: 
 IRQMASK: 0
 GPR00: 0004 75fedf30 7fff906a7300 0001
 GPR04: 01002a7355b0 0002 0001 75fef616
 GPR08: 0001   
 GPR12:  7fff9073a160  
 GPR16:    
 GPR20:  7fff906a4ee0 0002 0001
 GPR24: 7fff906a0898  0002 01002a7355b0
 GPR28: 0002 7fff906a1790 01002a7355b0 0002
 NIP [7fff905b9664] 0x7fff905b9664
 LR [7fff905320c4] 0x7fff905320c4
 --- interrupt: c00

When a system crashes on a CPU the same CPU is used to boot the capture
kernel. On the capture kernel boot path, there is a check that ensures
that the boot CPU must be present in the fdt passed to it and if not it
calls the BUG function that leads to system hang. We do see the capture
kernel hang when we crash on hot added CPUs because the capture kernel
fdt does not have the information of newly added CPUs, here is why.

When we prepare fdt for capture kernel we copy most of the content from
the fdt passed to the primary kernel including cpus node data. The fdt
passed to the primary kernel is also referred to as initial_boot_params.
In case we hot add a CPU the initial_boot_params do not get updated with
the new CPU information. Although we do re-run the kdump service to update
the kdump on cpu hot-add event but as the capture kernel fdt is prepared
from the initial_boot_params it lacks the CPUs node for hot added CPUs.

To ensure that the capture kernel fdt has the latest CPUs information we
update the entire cpus and its subnode data in capture kernel fdt whenever
kdump service is reloaded on CPU hotplug event. The hot added CPU data is
extracted from of_root device node and update in the capture kernel while
adding additional nodes and properties needed for capture kernel.

Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory 
reserve map")

Signed-off-by: Sourabh Jain 
---
 arch/powerpc/kexec/file_load_64.c | 100 ++
 1 file changed, 100 insertions(+)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 02b9e4d0dc40..63a30f1ddc2c 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -960,6 +960,99 @@ unsigned int kexec_fdt_totalsize_ppc64(struct kimage 
*image)
return fdt_size;
 }
 
+/**
+ * add_node_prop - Read property from device node structure and add
+ * them to fdt.
+ * @fdt:   Flattened device tree of the kernel
+ * @node_offset:   offset of the node to add a property at
+ * np: device node pointer
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int add_node_prop(void *fdt, int node_offset, const struct device_node *np)
+{
+   int ret = 0;
+   struct property *pp;
+   unsigned long flags;
+
+   if (!np)
+   return -EINVAL;
+
+   raw_spin_lock_irqsave(_lock, flags);
+   for (pp = np->properties; pp; pp = pp->next) {
+   ret = fdt_setprop(fdt, node_offset, pp->name,
+ pp->value, pp->length);
+   if (ret < 0) {
+   pr_err("Unable to add %s property: %s\n",
+   pp->name, fdt_strerror(ret));
+   goto out;
+   }
+   }
+out:
+   raw_spin_unlock_irqrestore(_lock, flags);
+   return ret;
+}
+
+/**
+ * update_cpus_node - Update cpus node of flattened device-tree using of_root
+ * device node.
+ * 

[PATCH v6] powerpc/fadump: fix race between pstore write and fadump crash trigger

2020-07-12 Thread Sourabh Jain
When we enter into fadump crash path via system reset we fail to update
the pstore.

On the system reset path we first update the pstore then we go for fadump
crash. But the problem here is when all the CPUs try to get the pstore
lock to initiate the pstore write, only one CPUs will acquire the lock
and proceed with the pstore write. Since it in NMI context CPUs that fail
to get lock do not wait for their turn to write to the pstore and simply
proceed with the next operation which is fadump crash. One of the CPU who
proceeded with fadump crash path triggers the crash and does not wait for
the CPU who gets the pstore lock to complete the pstore update.

Timeline diagram to depicts the sequence of events that leads to an
unsuccessful pstore update when we hit fadump crash path via system reset.

 12 3...  n   CPU Threads
 || | |
 || | |
 Reached to   -->|--->|>| --->|
 system reset|| | |
 path|| | |
 || | |
 Try to   -->|--->|>|>|
 acquire the || | |
 pstore lock || | |
 || | |
 || | |
 Got the  -->| +->| | |<-+
 pstore lock | |  | | |  |-->  Didn't get the
 | --+ lock and moving
 || | |ahead on fadump
 || | |crash path
 || | |
  Begins the  -->|| | |
  process to || | |<-- Got the chance to
  update the || | |trigger the crash
  pstore | -> | |... <-   |
 | |  | | |   |
 | |  | | |   |<-- Triggers the
 | |  | | |   |crash
 | |  | | |   |  ^
 | |  | | |   |  |
  Writing to  -->| |  | | |   |  |
  pstore | |  | | |   |  |
   |  |  |
   ^   |__|  |
   |   CPU Relax |
   | |
   +-+
  |
  v
Race: crash triggered before pstore
  update completes

To avoid this race condition a barrier is added on crash_fadump path, it
prevents the CPU to trigger the crash until all the online CPUs completes
their task.

A barrier is added to make sure all the secondary CPUs hit the
crash_fadump function before we initiates the crash. A timeout is kept to
ensure the primary CPU (one who initiates the crash) do not wait for
secondary CPUs indefinitely.

Signed-off-by: Sourabh Jain 
---
 arch/powerpc/kernel/fadump.c | 24 
 1 file changed, 24 insertions(+)

---
Chanagelog:

v1 -> v3:
   - https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208267.html

v3 -> v4:

   - Now the primary CPU (one who triggers dump) waits for all secondary
 CPUs to enter and then initiates the crash.

v4 -> v5:
- Fixed a build failure reported by kernel test robot 
  Now the cpus_in_crash variable is defined outside CONFIG_CMA
  config option.

v5 -> v6
- Changed a variable name cpus_in_crash -> cpus_in_fadump.
---

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 78ab9a6ee6ac..1858896d6809 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -32,11 +32,20 @@
 #include 
 #include 
 
+/*
+ * The CPU who acquired the lock to trigger the fadump crash should
+ * wait for other CPUs to enter.
+ *
+ * The timeout is in milliseconds.
+ */
+#define CRASH_TIMEOUT  500
+
 static struct fw_dump fw_dump;
 
 static void __init fadump_reserve_crash_area(u64 base);
 
 struct kobject *fadump_kobj;
+static atomic_t cpus_in_fadump;
 
 #ifndef CONFIG_PRESERVE_FA_DUMP
 static DEFINE_MUTEX(fadump_mutex);
@@ -668,8 +677,11 @@ early_param("fadump_reserve_mem", 
early_fadump_reserve_mem);
 
 void crash_fadump(struct pt_regs *regs, const char *str)
 {
+   unsigned int msecs;
struct fadump_crash_info_header *fdh = NULL;
int old_cpu, this_cpu;
+   /* Do not include first CPU */
+   unsigned int ncpus = num_online_cpus() - 1;
 
if (!should_fadump_crash())
return;
@@ -685,6 +697,8 @@ void crash_fadump(struct pt_regs *regs, const char *str)
old_cpu = cmpxchg(_cpu, -1, this_cpu);
 
if (old_cpu != -1) {
+   atomic_inc(_in_fadump)

[PATCH v5] powerpc/fadump: fix race between pstore write and fadump crash trigger

2020-06-17 Thread Sourabh Jain
When we enter into fadump crash path via system reset we fail to update
the pstore.

On the system reset path we first update the pstore then we go for fadump
crash. But the problem here is when all the CPUs try to get the pstore
lock to initiate the pstore write, only one CPUs will acquire the lock
and proceed with the pstore write. Since it in NMI context CPUs that fail
to get lock do not wait for their turn to write to the pstore and simply
proceed with the next operation which is fadump crash. One of the CPU who
proceeded with fadump crash path triggers the crash and does not wait for
the CPU who gets the pstore lock to complete the pstore update.

Timeline diagram to depicts the sequence of events that leads to an
unsuccessful pstore update when we hit fadump crash path via system reset.

 12 3...  n   CPU Threads
 || | |
 || | |
 Reached to   -->|--->|>| --->|
 system reset|| | |
 path|| | |
 || | |
 Try to   -->|--->|>|>|
 acquire the || | |
 pstore lock || | |
 || | |
 || | |
 Got the  -->| +->| | |<-+
 pstore lock | |  | | |  |-->  Didn't get the
 | --+ lock and moving
 || | |ahead on fadump
 || | |crash path
 || | |
  Begins the  -->|| | |
  process to || | |<-- Got the chance to
  update the || | |trigger the crash
  pstore | -> | |... <-   |
 | |  | | |   |
 | |  | | |   |<-- Triggers the
 | |  | | |   |crash
 | |  | | |   |  ^
 | |  | | |   |  |
  Writing to  -->| |  | | |   |  |
  pstore | |  | | |   |  |
   |  |  |
   ^   |__|  |
   |   CPU Relax |
   | |
   +-+
  |
  v
Race: crash triggered before pstore
  update completes

To avoid this race condition a barrier is added on crash_fadump path, it
prevents the CPU to trigger the crash until all the online CPUs completes
their task.

A barrier is added to make sure all the secondary CPUs hit the
crash_fadump function before we initiates the crash. A timeout is kept to
ensure the primary CPU (one who initiates the crash) do not wait for
secondary CPUs indefinitely.

Signed-off-by: Sourabh Jain 
---
 arch/powerpc/kernel/fadump.c | 25 +
 1 file changed, 25 insertions(+)

---
Chanagelog:

v1 -> v3:
   - https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208267.html

v3 -> v4:

   - Now the primary CPU (one who triggers dump) waits for all secondary
 CPUs to enter and then initiates the crash.

v4 -> v5:
- Fixed a build failure reported by kernel test robot 
  Now the cpus_in_crash variable is defined outside CONFIG_CMA
  config option.
---

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index ff0114aeba9b..08dfa9d34096 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -32,10 +32,20 @@
 #include 
 #include 
 
+/*
+ * The CPU who acquired the lock to trigger the fadump crash should
+ * wait for other CPUs to enter.
+ *
+ * The timeout is in milliseconds.
+ */
+#define CRASH_TIMEOUT  500
+
 static struct fw_dump fw_dump;
 
 static void __init fadump_reserve_crash_area(u64 base);
 
+static atomic_t cpus_in_crash;
+
 #ifndef CONFIG_PRESERVE_FA_DUMP
 static DEFINE_MUTEX(fadump_mutex);
 struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0 };
@@ -594,8 +604,11 @@ early_param("fadump_reserve_mem", 
early_fadump_reserve_mem);
 
 void crash_fadump(struct pt_regs *regs, const char *str)
 {
+   unsigned int msecs;
struct fadump_crash_info_header *fdh = NULL;
int old_cpu, this_cpu;
+   /* Do not include first CPU */
+   unsigned int ncpus = num_online_cpus() - 1;
 
if (!should_fadump_crash())
return;
@@ -611,6 +624,8 @@ void crash_fadump(struct pt_regs *regs, const char *str)
old_cpu = cmpxchg(_cpu, -1, this_cpu);
 
if (old_cpu != -1) {
+   atomic_inc(_in_crash);
+
  

[PATCH] powerpc/fadump: update kernel logs before fadump crash begins

2020-06-05 Thread Sourabh Jain
When we hit the fadump crash via the panic path the pstore update is
missing. This is observed when commit 8341f2f222d7 ("sysrq: Use panic()
to force a crash") changed the sysrq-trigger to take panic path instead
of die path.

The PPC panic event handler addresses the system panic in two different
ways based on the system configuration. It first allows the FADump (if
configured) to handle the kernel panic else forwards the call to platform
specific panic function. Now pstore update is missing only if FADump
handles the kernel panic, the platform-specific panic function do update
the pstore by calling panic_flush_kmsg_end function.

The simplest approach to handle this issue is to add pstore update in PPC
panic handler before FADump handles the panic. But this leads to multiple
pstore updates in case FADump is not configured and platform-specific
panic function serves the kernel panic.

Hence the function panic_flush_kmsg_end (used by the platform-specific
panic function to update the kernel logs) is split into two functions, one
will update the pstore (called in ppc panic event handler) and others will
flush the kmsg on the console (called in platform specific panic function).

Signed-off-by: Sourabh Jain 
---
 arch/powerpc/include/asm/bug.h |  2 ++
 arch/powerpc/kernel/setup-common.c |  1 +
 arch/powerpc/kernel/traps.c| 12 +++-
 arch/powerpc/platforms/ps3/setup.c |  2 +-
 arch/powerpc/platforms/pseries/setup.c |  2 +-
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 338f36cd9934..9268551a69bc 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -118,6 +118,8 @@ extern void _exception_pkey(struct pt_regs *, unsigned 
long, int);
 extern void die(const char *, struct pt_regs *, long);
 extern bool die_will_crash(void);
 extern void panic_flush_kmsg_start(void);
+extern void panic_flush_kmsg_dump(void);
+extern void panic_flush_kmsg_console(void);
 extern void panic_flush_kmsg_end(void);
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 7f8c890360fe..2d546a9e8bb1 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -699,6 +699,7 @@ static int ppc_panic_event(struct notifier_block *this,
 * want interrupts to be hard disabled.
 */
hard_irq_disable();
+   panic_flush_kmsg_dump();
 
/*
 * If firmware-assisted dump has been registered then trigger
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 82a3438300fd..bb6bc19992b3 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -169,15 +169,25 @@ extern void panic_flush_kmsg_start(void)
bust_spinlocks(1);
 }
 
-extern void panic_flush_kmsg_end(void)
+extern void panic_flush_kmsg_dump(void)
 {
printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
+}
+
+extern void panic_flush_kmsg_console(void)
+{
bust_spinlocks(0);
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
 }
 
+extern void panic_flush_kmsg_end(void)
+{
+   panic_flush_kmsg_dump();
+   panic_flush_kmsg_console();
+}
+
 static unsigned long oops_begin(struct pt_regs *regs)
 {
int cpu;
diff --git a/arch/powerpc/platforms/ps3/setup.c 
b/arch/powerpc/platforms/ps3/setup.c
index b29368931c56..f96ba34284a1 100644
--- a/arch/powerpc/platforms/ps3/setup.c
+++ b/arch/powerpc/platforms/ps3/setup.c
@@ -101,7 +101,7 @@ static void ps3_panic(char *str)
printk("   System does not reboot automatically.\n");
printk("   Please press POWER button.\n");
printk("\n");
-   panic_flush_kmsg_end();
+   panic_flush_kmsg_console();
 
while(1)
lv1_pause(1);
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 0c8421dd01ab..66ecb88c4b8e 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -788,7 +788,7 @@ static void __init pSeries_setup_arch(void)
 
 static void pseries_panic(char *str)
 {
-   panic_flush_kmsg_end();
+   panic_flush_kmsg_console();
rtas_os_term(str);
 }
 
-- 
2.25.4



[PATCH v4] powerpc/fadump: fix race between pstore write and fadump crash trigger

2020-06-04 Thread Sourabh Jain
When we enter into fadump crash path via system reset we fail to update
the pstore.

On the system reset path we first update the pstore then we go for fadump
crash. But the problem here is when all the CPUs try to get the pstore
lock to initiate the pstore write, only one CPUs will acquire the lock
and proceed with the pstore write. Since it in NMI context CPUs that fail
to get lock do not wait for their turn to write to the pstore and simply
proceed with the next operation which is fadump crash. One of the CPU who
proceeded with fadump crash path triggers the crash and does not wait for
the CPU who gets the pstore lock to complete the pstore update.

Timeline diagram to depicts the sequence of events that leads to an
unsuccessful pstore update when we hit fadump crash path via system reset.

 12 3...  n   CPU Threads
 || | |
 || | |
 Reached to   -->|--->|>| --->|
 system reset|| | |
 path|| | |
 || | |
 Try to   -->|--->|>|>|
 acquire the || | |
 pstore lock || | |
 || | |
 || | |
 Got the  -->| +->| | |<-+
 pstore lock | |  | | |  |-->  Didn't get the
 | --+ lock and moving
 || | |ahead on fadump
 || | |crash path
 || | |
  Begins the  -->|| | |
  process to || | |<-- Got the chance to
  update the || | |trigger the crash
  pstore | -> | |... <-   |
 | |  | | |   |
 | |  | | |   |<-- Triggers the
 | |  | | |   |crash
 | |  | | |   |  ^
 | |  | | |   |  |
  Writing to  -->| |  | | |   |  |
  pstore | |  | | |   |  |
   |  |  |
   ^   |__|  |
   |   CPU Relax |
   | |
   +-+
  |
  v
Race: crash triggered before pstore
  update completes

To avoid this race condition a barrier is added on crash_fadump path, it
prevents the CPU to trigger the crash until all the online CPUs completes
their task.

A barrier is added to make sure all the secondary CPUs hit the
crash_fadump function before we initiates the crash. A timeout is kept to
ensure the primary CPU (one who initiates the crash) do not wait for
secondary CPUs indefinitely.

Signed-off-by: Sourabh Jain 
---
 arch/powerpc/kernel/fadump.c | 24 
 1 file changed, 24 insertions(+)

 ---
Chanagelog:

v1 -> v3:
   - https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208267.html

v3 -> v4:

   - Now the primary CPU (one who triggers dump) waits for all secondary
 CPUs to enter and then initiates the crash.

 ---

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 59e60a9a9f5c..4953f3246220 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -32,6 +32,14 @@
 #include 
 #include 
 
+/*
+ * The CPU who acquired the lock to trigger the fadump crash should
+ * wait for other CPUs to enter.
+ *
+ * The timeout is in milliseconds.
+ */
+#define CRASH_TIMEOUT  500
+
 static struct fw_dump fw_dump;
 
 static void __init fadump_reserve_crash_area(u64 base);
@@ -46,6 +54,8 @@ struct fadump_mrange_info reserved_mrange_info = { 
"reserved", NULL, 0, 0, 0 };
 #ifdef CONFIG_CMA
 static struct cma *fadump_cma;
 
+static atomic_t cpus_in_crash;
+
 /*
  * fadump_cma_init() - Initialize CMA area from a fadump reserved memory
  *
@@ -596,8 +606,10 @@ early_param("fadump_reserve_mem", 
early_fadump_reserve_mem);
 
 void crash_fadump(struct pt_regs *regs, const char *str)
 {
+   unsigned int msecs;
struct fadump_crash_info_header *fdh = NULL;
int old_cpu, this_cpu;
+   unsigned int ncpus = num_online_cpus() - 1; /* Do not include first CPU 
*/
 
if (!should_fadump_crash())
return;
@@ -613,6 +625,8 @@ void crash_fadump(struct pt_regs *regs, const char *str)
old_cpu = cmpxchg(_cpu, -1, this_cpu);
 
if (old_cpu != -1) {
+   atomic_inc(_in_crash);
+
/*
 * We can't loop here indefinitely. Wait as long as fadump