Re: [PATCH 1/2] kexec: fix KEXEC_FILE dependencies

2023-11-30 Thread Eric DeVolder


On 11/30/23 10:56, Andrew Morton wrote:

On Thu, 2 Nov 2023 16:03:18 +0800 Baoquan He  wrote:


CONFIG_KEXEC_FILE, but still get purgatory code built in which is
totally useless.

Not sure if I think too much over this.

I see your point here, and I would suggest changing the
CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY symbol to just indicate
the availability of the purgatory code for the arch, rather
than actually controlling the code itself. I already mentioned
this for s390, but riscv would need the same thing on top.

I think the change below should address your concern.

Since no new comment, do you mind spinning v2 to wrap all these up?

This patchset remains in mm-hotfixes-unstable from the previous -rc
cycle.  Eric, do you have any comments?  Arnd, do you plan on a v2?  If
not, should I merge v1?  If so, should I now add cc:stable?


My apologies, I lost this. I've looked at these changes, and I am in 
favor of these changes.


Furthermore, I ran the following thru the Kconfig regression script, and 
did not find anything!


I believe the following patch represents the current discussion threads 
around Kconfig and KEXEC/CRASH.


Reviewed-by: Eric DeVolder 

Tested-by: Eric DeVolder 

Thanks!

eric

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f105ee4f3cf..1f11a62809f2 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -608,10 +608,10 @@ config ARCH_SUPPORTS_KEXEC
 def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)

 config ARCH_SUPPORTS_KEXEC_FILE
-    def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y
+    def_bool PPC64

 config ARCH_SUPPORTS_KEXEC_PURGATORY
-    def_bool KEXEC_FILE
+    def_bool y

 config ARCH_SELECTS_KEXEC_FILE
 def_bool y
diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild
index d25ad1c19f88..ab181d187c23 100644
--- a/arch/riscv/Kbuild
+++ b/arch/riscv/Kbuild
@@ -5,7 +5,7 @@ obj-$(CONFIG_BUILTIN_DTB) += boot/dts/
 obj-y += errata/
 obj-$(CONFIG_KVM) += kvm/

-obj-$(CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY) += purgatory/
+obj-$(CONFIG_KEXEC_FILE) += purgatory/

 # for cleaning
 subdir- += boot
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 95a2a06acc6a..98857d76e458 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -702,9 +702,7 @@ config ARCH_SELECTS_KEXEC_FILE
 select KEXEC_ELF

 config ARCH_SUPPORTS_KEXEC_PURGATORY
-    def_bool KEXEC_FILE
-    depends on CRYPTO=y
-    depends on CRYPTO_SHA256=y
+    def_bool y

 config ARCH_SUPPORTS_CRASH_DUMP
 def_bool y
diff --git a/arch/riscv/kernel/elf_kexec.c b/arch/riscv/kernel/elf_kexec.c
index e60fbd8660c4..3ac341d296db 100644
--- a/arch/riscv/kernel/elf_kexec.c
+++ b/arch/riscv/kernel/elf_kexec.c
@@ -266,7 +266,7 @@ static void *elf_kexec_load(struct kimage *image, 
char *kernel_buf,

     cmdline = modified_cmdline;
 }

-#ifdef CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY
+#ifdef CONFIG_KEXEC_FILE
 /* Add purgatory to the image */
 kbuf.top_down = true;
 kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
@@ -280,7 +280,7 @@ static void *elf_kexec_load(struct kimage *image, 
char *kernel_buf,

                  sizeof(kernel_start), 0);
 if (ret)
     pr_err("Error update purgatory ret=%d\n", ret);
-#endif /* CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY */
+#endif /* CONFIG_KEXEC_FILE */

 /* Add the initrd to the image */
 if (initrd != NULL) {
diff --git a/arch/s390/Kbuild b/arch/s390/Kbuild
index a5d3503b353c..f2ce80b65551 100644
--- a/arch/s390/Kbuild
+++ b/arch/s390/Kbuild
@@ -7,7 +7,7 @@ obj-$(CONFIG_S390_HYPFS)    += hypfs/
 obj-$(CONFIG_APPLDATA_BASE)    += appldata/
 obj-y                += net/
 obj-$(CONFIG_PCI)        += pci/
-obj-$(CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY) += purgatory/
+obj-$(CONFIG_KEXEC_FILE) += purgatory/

 # for cleaning
 subdir- += boot tools
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 3bec98d20283..d5d8f99d1f25 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -254,13 +254,13 @@ config ARCH_SUPPORTS_KEXEC
 def_bool y

 config ARCH_SUPPORTS_KEXEC_FILE
-    def_bool CRYPTO && CRYPTO_SHA256 && CRYPTO_SHA256_S390
+    def_bool y

 config ARCH_SUPPORTS_KEXEC_SIG
 def_bool MODULE_SIG_FORMAT

 config ARCH_SUPPORTS_KEXEC_PURGATORY
-    def_bool KEXEC_FILE
+    def_bool y

 config ARCH_SUPPORTS_CRASH_DUMP
 def_bool y
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3762f41bb092..1566748f16c4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2072,7 +2072,7 @@ config ARCH_SUPPORTS_KEXEC
 def_bool y

 config ARCH_SUPPORTS_KEXEC_FILE
-    def_bool X86_64 && CRYPTO && CRYPTO_SHA256
+    def_bool X86_64

 config ARCH_SELECTS_KEXEC_FILE
 def_bool y
@@ -2080,7 +2080,7 @@ config ARCH_SELECTS_KEXEC_FILE
 select HAVE_IMA_KEXEC if IMA

 config ARCH_SUPPORTS_KEXEC_PURGATORY
-    def_bool KEXEC_FILE
+    def_bool y

 config ARCH_SUPPORTS_KEXEC_SIG
 def_bool y
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index 7aff28d

Re: [PATCH v2] kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP

2023-11-28 Thread Eric DeVolder



On 11/27/23 23:44, Baoquan He wrote:

Ignat Korchagin complained that a potential config regression was
introduced by commit 89cde455915f ("kexec: consolidate kexec and
crash options into kernel/Kconfig.kexec"). Before the commit,
CONFIG_CRASH_DUMP has no dependency on CONFIG_KEXEC. After the commit,
CRASH_DUMP selects KEXEC. That enforces system to have CONFIG_KEXEC=y
as long as CONFIG_CRASH_DUMP=Y which people may not want.

In Ignat's case, he sets CONFIG_CRASH_DUMP=y, CONFIG_KEXEC_FILE=y and
CONFIG_KEXEC=n because kexec_load interface could have security issue if
kernel/initrd has no chance to be signed and verified.

CRASH_DUMP has select of KEXEC because Eric, author of above commit,
met a LKP report of build failure when posting patch of earlier version.
Please see below link to get detail of the LKP report:

 
https://lore.kernel.org/all/3e8eecd1-a277-2cfb-690e-5de2eb7b9...@oracle.com/T/#u

In fact, that LKP report is triggered because arm's  is
wrapped in CONFIG_KEXEC ifdeffery scope. That is wrong. CONFIG_KEXEC
controls the enabling/disabling of kexec_load interface, but not kexec
feature. Removing the wrongly added CONFIG_KEXEC ifdeffery scope in
 of arm allows us to drop the select KEXEC for CRASH_DUMP.
Meanwhile, change arch/arm/kernel/Makefile to let machine_kexec.o
relocate_kernel.o depend on KEXEC_CORE.

Fixes: commit 89cde455915f ("kexec: consolidate kexec and crash options into 
kernel/Kconfig.kexec")
Reported-by: Ignat Korchagin 
Signed-off-by: Baoquan He 
---
  arch/arm/include/asm/kexec.h | 4 
  arch/arm/kernel/Makefile | 2 +-
  kernel/Kconfig.kexec | 1 -
  3 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/kexec.h b/arch/arm/include/asm/kexec.h
index e62832dcba76..a8287e7ab9d4 100644
--- a/arch/arm/include/asm/kexec.h
+++ b/arch/arm/include/asm/kexec.h
@@ -2,8 +2,6 @@
  #ifndef _ARM_KEXEC_H
  #define _ARM_KEXEC_H
  
-#ifdef CONFIG_KEXEC

-
  /* Maximum physical address we can use pages from */
  #define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
  /* Maximum address we can reach in physical address mode */
@@ -82,6 +80,4 @@ static inline struct page *boot_pfn_to_page(unsigned long 
boot_pfn)
  
  #endif /* __ASSEMBLY__ */
  
-#endif /* CONFIG_KEXEC */

-
  #endif /* _ARM_KEXEC_H */
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index d53f56d6f840..771264d4726a 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -59,7 +59,7 @@ obj-$(CONFIG_FUNCTION_TRACER) += entry-ftrace.o
  obj-$(CONFIG_DYNAMIC_FTRACE)  += ftrace.o insn.o patch.o
  obj-$(CONFIG_FUNCTION_GRAPH_TRACER)   += ftrace.o insn.o patch.o
  obj-$(CONFIG_JUMP_LABEL)  += jump_label.o insn.o patch.o
-obj-$(CONFIG_KEXEC)+= machine_kexec.o relocate_kernel.o
+obj-$(CONFIG_KEXEC_CORE)   += machine_kexec.o relocate_kernel.o
  # Main staffs in KPROBES are in arch/arm/probes/ .
  obj-$(CONFIG_KPROBES) += patch.o insn.o
  obj-$(CONFIG_OABI_COMPAT) += sys_oabi-compat.o
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index 7aff28ded2f4..1cc3b1c595d7 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -97,7 +97,6 @@ config CRASH_DUMP
depends on ARCH_SUPPORTS_KEXEC
select CRASH_CORE
select KEXEC_CORE
-   select KEXEC
help
  Generate crash dump after being started by kexec.
  This should be normally only set in special crash dump kernels


I have run this change against the kconfig regression script, and it did 
not find any differences!


Reviewed-by: Eric DeVolder 




___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] drivers/base/cpu: crash data showing should depends on KEXEC_CORE

2023-11-28 Thread Eric DeVolder


On 11/27/23 23:52, Baoquan He wrote:

After commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs
attributes"), on x86_64, if only below kernel configs related to kdump
are set, compiling error are triggered.


CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_CRASH_DUMP=y
CONFIG_CRASH_HOTPLUG=y
--

--
drivers/base/cpu.c: In function ‘crash_hotplug_show’:
drivers/base/cpu.c:309:40: error: implicit declaration of function 
‘crash_hotplug_cpu_support’; did you mean ‘crash_hotplug_show’? 
[-Werror=implicit-function-declaration]
   309 | return sysfs_emit(buf, "%d\n", crash_hotplug_cpu_support());
   |^
   |crash_hotplug_show
cc1: some warnings being treated as errors
--

CONFIG_KEXEC is used to enable kexec_load interface, the
crash_notes/crash_notes_size/crash_hotplug showing depends on
CONFIG_KEXEC is incorrect. It should depend on KEXEC_CORE instead.

Fix it now.

Fixes: commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes")
Signed-off-by: Baoquan He 


Reviewed-by: Eric DeVolder 



---
  drivers/base/cpu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 9ea22e165acd..548491de818e 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -144,7 +144,7 @@ static DEVICE_ATTR(release, S_IWUSR, NULL, 
cpu_release_store);
  #endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
  #endif /* CONFIG_HOTPLUG_CPU */
  
-#ifdef CONFIG_KEXEC

+#ifdef CONFIG_KEXEC_CORE
  #include 
  
  static ssize_t crash_notes_show(struct device *dev,

@@ -189,14 +189,14 @@ static const struct attribute_group 
crash_note_cpu_attr_group = {
  #endif
  
  static const struct attribute_group *common_cpu_attr_groups[] = {

-#ifdef CONFIG_KEXEC
+#ifdef CONFIG_KEXEC_CORE
_note_cpu_attr_group,
  #endif
NULL
  };
  
  static const struct attribute_group *hotplugable_cpu_attr_groups[] = {

-#ifdef CONFIG_KEXEC
+#ifdef CONFIG_KEXEC_CORE
_note_cpu_attr_group,
  #endif
NULL


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/3] kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP

2023-11-25 Thread Eric DeVolder



On 11/23/23 01:36, Baoquan He wrote:

Ignat reported a potential config regression was introduced by
commit 89cde455915f ("kexec: consolidate kexec and crash options
into kernel/Kconfig.kexec"). Please click below link for more details:

https://lore.kernel.org/all/CALrw=nhprqqaqtp_jzfregrqemps8jbf8jqcv4ygqxyce-s...@mail.gmail.com/T/#u

The patch 1 fix the regression by removing incorrect CONFIG_KEXEC
ifdeffery scope adding in arm's , then dropping the
select of KEXEC for CRASH_DUMP. This is tested and passed a cross
comiping of arm.

Patch 2 is to fix a build failure when I tested patch 1 on x86_64, the
wrong CONFIG_KEXEC iddeffery is replaced with CONFIG_KEXEC_CORE. Test
passed on x86_64.

Patch 3 is to fix an unnecessary 'select KEXEC' in s390 ARCH. Removing
the select won't impact anything. Test passed on a ibm-z system.


I apologize for my delay in responding, I did not have a computer with 
me during my holiday travel.


I was able to re-run my Kconfig test script with this patch series (now 
that I'm running this on private resources, it takes half a day 8( ). 
The script only performs comparisons of the .config before (LHSB) and 
after (RHSB) the patch series; it does NOT do any building. At any rate, 
what that revealed was only differences in s390. That means that all 
other arches do not have any unintended side effects. The differences 
with patch3 applied look like:


FAIL: allnoconfig arch/s390/configs/kasan.config
LHSB {'CONFIG_CRASH_CORE': 'y', 'CONFIG_KEXEC_CORE': 'y', 
'CONFIG_KEXEC': 'y'}

RHSB {'CONFIG_KEXEC': 'n'}

The 'allnoconfig' and 'olddefconfig' targets failed for all s390 
defconfigs. The LHSB is the pre-patch values, and the RHSB is the 
post-patch values. So this states that CRASH_CORE and KEXEC_CORE were 
set previously, but now they are not. KEXEC obviously is being turned 
off intentionally.


Hope this helps some.

Regards,

eric




Baoquan He (3):
   kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
   drivers/base/cpu: crash data showing should depends on KEXEC_CORE
   s390/Kconfig: drop select of KEXEC

  arch/arm/include/asm/kexec.h | 4 
  arch/s390/Kconfig| 1 -
  drivers/base/cpu.c   | 6 +++---
  kernel/Kconfig.kexec | 1 -
  4 files changed, 3 insertions(+), 9 deletions(-)



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 0/6] crashdump: Kernel handling of CPU and memory hot un/plug

2023-10-04 Thread Eric DeVolder




On 10/4/23 07:08, Simon Horman wrote:

On Wed, Sep 27, 2023 at 02:11:30PM -0400, Eric DeVolder wrote:

When the kdump service is loaded, if a CPU or memory is hot
un/plugged, the crash elfcorehdr, which describes the CPUs and memory
in the system, must also be updated, else the resulting vmcore is
inaccurate (eg. missing either CPU context or memory regions).

The current solution utilizes udev (eg. RHEL /usr/lib/udev/rules.d/
98-kexec.rules) to initiate an unload-then-reload of the *entire* kdump
image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by
the userspace kexec utility. This occurrs just so the elfcorehdr can
be updated with the latest list of CPUs and memory regions. In a
previous post I have outlined the significant performance problems
related to offloading this activity to userspace.

With the Linux kernel 6.6 commit below, the kernel now has the ability
to directly modify the elfcorehdr, eliminating the need to
unload-then-reload the entire kdump image when CPU or memory is hot
un/plugged or on/offlined.

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6
8b4b6f307d155475cce541f2aee938032ed22e

This kexec-tools patch series is for supporting hotplug with the
kexec_load() syscall; the kernel directly supports hotplug for the
kexec_file_load() syscall, requiring no userspace help.

There are two basic obstacles/requirements for the kexec-tools to
overcome in order to support kernel hotplug rewriting of the
elfcorehdr.

First, the buffer containing the elfcorehdr must be excluded from the
purgatory checksum/digest, which is computed at load time. Otherwise
kernel run-time changes to the elfcorehdr, as a result of hot un/plug,
would result in the checksum failing (specifically in purgatory at
panic kernel boot time), and kdump capture kernel failing to start.
To let the kernel know it is okay to modify the elfcorehdr, kexec
sets the KEXEC_UPDATE_ELFCOREHDR flag.

NOTE: The kernel specifically does *NOT* attempt to recompute the
checksum/digest as that would ultimately require patching the in-
memory purgatory image with the updated checksum. As that purgatory
image is already fully linked, it is binary blob containing no ELF
information which would allow it to be re-linked or patched. Thus
excluding the elfcorehdr from the checksum/digests avoids all these
problems.

Second, the size of the elfcorehdr buffer must be large enough
to accomodate growth of the number of CPUs and/or memory regions.

To satisfy the first requirement, this patch series introduces the
--hotplug option to indicate to kexec-tools that kexec should exclude
the elfcorehdr buffer from the purgatory checksum/digest calculation
and set the KEXEC_UPDATE_ELFCOREHDR flag.

To satisfy the second requirement, the size is obtained from the
/sys/kernel/crash_elfcorehdr_size node (new with the kernel series
cited above).

To use this feature with kexec_load() syscall, invoke kexec with:

  kexec -c --hotplug ...

Thanks!
eric


Thanks Eric,

applied.


Excellent, thank you!
eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v3 6/6] crashdump/x86: set the elfcorehdr segment size for hotplug

2023-09-27 Thread Eric DeVolder
For hotplug, the elfcorehdr segment must be sized appropriately
to allow a growing number of CPUs or memory regions. Use the size
reported by the kernel via /sys/kernel/crash_elfcorehdr_sz.

Signed-off-by: Eric DeVolder 
---
 kexec/arch/i386/crashdump-x86.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index cb86ca7..a01031e 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -957,6 +957,14 @@ int load_crashdump_segments(struct kexec_info *info, char* 
mod_cmdline,
memsz = bufsz;
}
 
+   /* For hotplug support, override the minimum necessary size just
+* computed with the value from /sys/kernel/crash_elfcorehdr_size.
+* Properly align the size as well.
+*/
+   if (do_hotplug) {
+   memsz = _ALIGN(elfcorehdrsz, align);
+   }
+
/* Record the location of the elfcorehdr for hotplug handling */
info->elfcorehdr =
elfcorehdr = add_buffer(info, tmp, bufsz, memsz, align, min_base,
-- 
2.39.3


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v3 4/6] crashdump: exclude elfcorehdr segment from digest for hotplug

2023-09-27 Thread Eric DeVolder
To allow direct modification of the elfcorehdr by the kernel, in
response to CPU and memory hot un/plug and/or online/offline events,
the buffer containing the elfcorehdr must be excluded from the
purgatory checksum/digest.

If the elfcorehdr is not excluded from the purgatory checksum/digest,
then at panic time, the checksum/digest check fails (due to the
elfcorehdr having been modified), and the kdump capture kernel does
not start.

Signed-off-by: Eric DeVolder 
---
 kexec/kexec.c | 8 
 kexec/kexec.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/kexec/kexec.c b/kexec/kexec.c
index 0207608..fdb4c98 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -689,6 +689,14 @@ static void update_purgatory(struct kexec_info *info)
if (info->segment[i].mem == (void *)info->rhdr.rel_addr) {
continue;
}
+
+   /* Don't include elfcorehdr in the checksum, if hotplug
+* support enabled.
+*/
+   if (do_hotplug && (info->segment[i].mem == (void 
*)info->elfcorehdr)) {
+   continue;
+   }
+
sha256_update(, info->segment[i].buf,
  info->segment[i].bufsz);
nullsz = info->segment[i].memsz - info->segment[i].bufsz;
diff --git a/kexec/kexec.h b/kexec/kexec.h
index 487f707..1004aff 100644
--- a/kexec/kexec.h
+++ b/kexec/kexec.h
@@ -170,6 +170,7 @@ struct kexec_info {
int command_line_len;
 
int skip_checks;
+   unsigned long elfcorehdr;
 };
 
 struct arch_map_entry {
-- 
2.39.3


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v3 3/6] crashdump: setup general hotplug support

2023-09-27 Thread Eric DeVolder
To allow direct modification of the elfcorehdr by the kernel, in
response to CPU and memory hot un/plug and/or online/offline events,
the following conditions must occur:

 - the elfcorehdr buffer must be excluded from the purgatory
   checksum/digest, and
 - the elfcorehdr segment must be large enough, and
 - the kernel must be notified that it can modify the elfcorehdr

Excluding the elfcorehdr buffer from the digest occurs in patch
"crashdump: exclude elfcorehdr segment from digest for hotplug".
If this is not done, a change to the elfcorehdr will cause the
purgatory check at panic time to fail, and kdump capture kernel
does not start.

For hotplug, the size of the elfcorehdr segment is obtained from the
kernel via the /sys/kernel/crash_elforehdr_size node.

The KEXEC_UPDATE_ELFCOREHDR flag indicates to the kernel that it can
make direct modifications to the elfcorehdr.

Signed-off-by: Eric DeVolder 
---
 kexec/kexec.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/kexec/kexec.c b/kexec/kexec.c
index d790748..0207608 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -1631,6 +1631,24 @@ int main(int argc, char *argv[])
die("--load-live-update can only be used with xen\n");
}
 
+   /* NOTE: Xen KEXEC_LIVE_UPDATE and KEXEC_UPDATE_ELFCOREHDR collide */
+   if (do_hotplug) {
+   const char *ces = "/sys/kernel/crash_elfcorehdr_size";
+   char *buf, *endptr = NULL;
+   off_t nread = 0;
+   buf = slurp_file_len(ces, sizeof(buf)-1, );
+   if (buf) {
+   if (buf[nread-1] == '\n')
+   buf[nread-1] = '\0';
+   elfcorehdrsz = strtoul(buf, , 0);
+   }
+   if (!elfcorehdrsz || (endptr && *endptr != '\0'))
+   die("Path %s does not exist, the kernel needs 
CONFIG_CRASH_HOTPLUG\n", ces);
+   dbgprintf("ELFCOREHDR_SIZE %lu\n", elfcorehdrsz);
+   /* Indicate to the kernel it is ok to modify the elfcorehdr */
+   kexec_flags |= KEXEC_UPDATE_ELFCOREHDR;
+   }
+
fileind = optind;
/* Reset getopt for the next pass; called in other source modules */
opterr = 1;
-- 
2.39.3


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v3 5/6] crashdump/x86: identify elfcorehdr segment for hotplug

2023-09-27 Thread Eric DeVolder
Identify the segment containing the elfcorehdr buffer so that
it can be excluded from the purgatory checksum/digest, if hotplug
support is in effect.

Signed-off-by: Eric DeVolder 
---
 kexec/arch/i386/crashdump-x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index df1f24c..cb86ca7 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -956,6 +956,9 @@ int load_crashdump_segments(struct kexec_info *info, char* 
mod_cmdline,
} else {
memsz = bufsz;
}
+
+   /* Record the location of the elfcorehdr for hotplug handling */
+   info->elfcorehdr =
elfcorehdr = add_buffer(info, tmp, bufsz, memsz, align, min_base,
max_addr, -1);
dbgprintf("Created elf header segment at 0x%lx\n", elfcorehdr);
-- 
2.39.3


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v3 0/6] crashdump: Kernel handling of CPU and memory hot un/plug

2023-09-27 Thread Eric DeVolder
When the kdump service is loaded, if a CPU or memory is hot
un/plugged, the crash elfcorehdr, which describes the CPUs and memory
in the system, must also be updated, else the resulting vmcore is
inaccurate (eg. missing either CPU context or memory regions).

The current solution utilizes udev (eg. RHEL /usr/lib/udev/rules.d/
98-kexec.rules) to initiate an unload-then-reload of the *entire* kdump
image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by
the userspace kexec utility. This occurrs just so the elfcorehdr can
be updated with the latest list of CPUs and memory regions. In a
previous post I have outlined the significant performance problems
related to offloading this activity to userspace.

With the Linux kernel 6.6 commit below, the kernel now has the ability
to directly modify the elfcorehdr, eliminating the need to
unload-then-reload the entire kdump image when CPU or memory is hot
un/plugged or on/offlined.

 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6
8b4b6f307d155475cce541f2aee938032ed22e

This kexec-tools patch series is for supporting hotplug with the
kexec_load() syscall; the kernel directly supports hotplug for the
kexec_file_load() syscall, requiring no userspace help.

There are two basic obstacles/requirements for the kexec-tools to
overcome in order to support kernel hotplug rewriting of the
elfcorehdr.

First, the buffer containing the elfcorehdr must be excluded from the
purgatory checksum/digest, which is computed at load time. Otherwise
kernel run-time changes to the elfcorehdr, as a result of hot un/plug,
would result in the checksum failing (specifically in purgatory at
panic kernel boot time), and kdump capture kernel failing to start.
To let the kernel know it is okay to modify the elfcorehdr, kexec
sets the KEXEC_UPDATE_ELFCOREHDR flag.

NOTE: The kernel specifically does *NOT* attempt to recompute the
checksum/digest as that would ultimately require patching the in-
memory purgatory image with the updated checksum. As that purgatory
image is already fully linked, it is binary blob containing no ELF
information which would allow it to be re-linked or patched. Thus
excluding the elfcorehdr from the checksum/digests avoids all these
problems.

Second, the size of the elfcorehdr buffer must be large enough
to accomodate growth of the number of CPUs and/or memory regions.

To satisfy the first requirement, this patch series introduces the
--hotplug option to indicate to kexec-tools that kexec should exclude
the elfcorehdr buffer from the purgatory checksum/digest calculation
and set the KEXEC_UPDATE_ELFCOREHDR flag.

To satisfy the second requirement, the size is obtained from the
/sys/kernel/crash_elfcorehdr_size node (new with the kernel series
cited above).

To use this feature with kexec_load() syscall, invoke kexec with:

 kexec -c --hotplug ...

Thanks!
eric

---
v3: 27sep2023
 - Cite the merged Linux 6.6 commit that supports crash hotplug.
 - Removed the --elfcorehdrsz option, instead using the the
   /sys/kernel/crash_elfcorehdr_size node from the new kernel
   crash hotplug feature.

v2: 3may2023
 http://lists.infradead.org/pipermail/kexec/2023-May/027049.html
 - Setting KEXEC_UPDATE_ELFCOREHDR flag
 - Utilizing /sys/kernel/crash_elfcorehdr_size info.

v1: 20oct2022
 http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
 - Initial patch series

RFC:
 https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d90...@oracle.com/
 s/vmcoreinfo/elfcorehdr/g
---


Eric DeVolder (6):
  kexec: define KEXEC_UPDATE_ELFCOREHDR
  crashdump: introduce the hotplug command line options
  crashdump: setup general hotplug support
  crashdump: exclude elfcorehdr segment from digest for hotplug
  crashdump/x86: identify elfcorehdr segment for hotplug
  crashdump/x86: set the elfcorehdr segment size for hotplug

 kexec/arch/i386/crashdump-x86.c | 11 +++
 kexec/kexec-syscall.h   |  1 +
 kexec/kexec.8   |  6 ++
 kexec/kexec.c   | 32 
 kexec/kexec.h   |  8 +++-
 5 files changed, 57 insertions(+), 1 deletion(-)

-- 
2.39.3


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v3 2/6] crashdump: introduce the hotplug command line options

2023-09-27 Thread Eric DeVolder
Introducing the --hotplug command line option, which is used to
indicate to the kernel that the kdump image is setup to permit
the kernel to directly modify the elfcorehdr in response to CPU
and memory hotplug and/or online/offline events.

This option is only meaningful for kexec_load() syscall. For the
kexec_file_load() syscall, this option is a no-op as the kernel
handles all aspects of loading the kdump image.

This is the command line processing and documentation.

Signed-off-by: Eric DeVolder 
---
 kexec/kexec.8 | 6 ++
 kexec/kexec.c | 6 ++
 kexec/kexec.h | 7 ++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/kexec/kexec.8 b/kexec/kexec.8
index 3a344c5..4400baf 100644
--- a/kexec/kexec.8
+++ b/kexec/kexec.8
@@ -132,6 +132,12 @@ in one call.
 Open a help file for
 .BR kexec .
 .TP
+.B \-\-hotplug
+Setup for kernel modification of the elfcorehdr. This option performs
+the steps needed to support kernel updates to the elfcorehdr in the
+presence of hot un/plug and/or on/offline events. This option only
+useful for KEXEC_LOAD syscall.
+.TP
 .B \-i\ (\-\-no-checks)
 Fast reboot, no memory integrity checks.
 .TP
diff --git a/kexec/kexec.c b/kexec/kexec.c
index 1edbd34..d790748 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -58,6 +58,8 @@
 
 unsigned long long mem_min = 0;
 unsigned long long mem_max = ULONG_MAX;
+unsigned long elfcorehdrsz = 0;
+int do_hotplug = 0;
 static unsigned long kexec_flags = 0;
 /* Flags for kexec file (fd) based syscall */
 static unsigned long kexec_file_flags = 0;
@@ -1069,6 +1071,7 @@ void usage(void)
   "  back to the compatibility syscall when 
file based\n"
   "  syscall is not supported or the kernel 
did not\n"
   "  understand the image (default)\n"
+  " --hotplugSetup for kernel modification of 
elfcorehdr.\n"
   " -d, --debug  Enable debugging to help spot a 
failure.\n"
   " -S, --status Return 1 if the type (by default crash) 
is loaded,\n"
   "  0 if not.\n"
@@ -1579,6 +1582,9 @@ int main(int argc, char *argv[])
case OPT_PRINT_CKR_SIZE:
print_crashkernel_region_size();
return 0;
+   case OPT_HOTPLUG:
+   do_hotplug = 1;
+   break;
default:
break;
}
diff --git a/kexec/kexec.h b/kexec/kexec.h
index 0933389..487f707 100644
--- a/kexec/kexec.h
+++ b/kexec/kexec.h
@@ -232,7 +232,8 @@ extern int file_types;
 #define OPT_PRINT_CKR_SIZE 262
 #define OPT_LOAD_LIVE_UPDATE   263
 #define OPT_EXEC_LIVE_UPDATE   264
-#define OPT_MAX265
+#define OPT_HOTPLUG265
+#define OPT_MAX266
 #define KEXEC_OPTIONS \
{ "help",   0, 0, OPT_HELP }, \
{ "version",0, 0, OPT_VERSION }, \
@@ -259,6 +260,7 @@ extern int file_types;
{ "debug",  0, 0, OPT_DEBUG }, \
{ "status", 0, 0, OPT_STATUS }, \
{ "print-ckr-size", 0, 0, OPT_PRINT_CKR_SIZE }, \
+   { "hotplug",0, 0, OPT_HOTPLUG }, \
 
 #define KEXEC_OPT_STR "h?vdfixyluet:pscaS"
 
@@ -297,6 +299,9 @@ extern int ifdown(void);
 extern char purgatory[];
 extern size_t purgatory_size;
 
+extern unsigned long elfcorehdrsz;
+extern int do_hotplug;
+
 #define BOOTLOADER "kexec"
 #define BOOTLOADER_VERSION PACKAGE_VERSION
 
-- 
2.39.3


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v3 1/6] kexec: define KEXEC_UPDATE_ELFCOREHDR

2023-09-27 Thread Eric DeVolder
The Linux kernel defines this flag to indicate that the kexec_load()'ed
image is setup so that the kernel may directly modify the elfcorehdr
(and not cause the purgatory digest checksum to fail) in response to
CPU or memory hot un/plug and/or on/offline events.

Define this flag to match/mirror the kernel flag.

Signed-off-by: Eric DeVolder 
---
 kexec/kexec-syscall.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kexec/kexec-syscall.h b/kexec/kexec-syscall.h
index 1e2d12f..2559bff 100644
--- a/kexec/kexec-syscall.h
+++ b/kexec/kexec-syscall.h
@@ -112,6 +112,7 @@ static inline long kexec_file_load(int kernel_fd, int 
initrd_fd,
 
 #define KEXEC_ON_CRASH 0x0001
 #define KEXEC_PRESERVE_CONTEXT 0x0002
+#define KEXEC_UPDATE_ELFCOREHDR0x0004
 #define KEXEC_ARCH_MASK0x
 
 /* Flags for kexec file based system call */
-- 
2.39.3


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3] Crash: add lock to serialize crash hotplug handling

2023-09-26 Thread Eric DeVolder




On 9/26/23 15:50, Andrew Morton wrote:

On Tue, 26 Sep 2023 20:09:05 +0800 Baoquan He  wrote:


Eric reported that handling corresponding crash hotplug event can be
failed easily when many memory hotplug event are notified in a short
period. They failed because failing to take __kexec_lock.


I'm assuming that this failure is sufficiently likely so as to  justify a 
-stable
backport of the fix.   Please let me know if this is incorrect.


Andrew,
Correct, this is sufficiently likely to happen.
Thanks,
eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3] Crash: add lock to serialize crash hotplug handling

2023-09-26 Thread Eric DeVolder




On 9/26/23 07:09, Baoquan He wrote:

Eric reported that handling corresponding crash hotplug event can be
failed easily when many memory hotplug event are notified in a short
period. They failed because failing to take __kexec_lock.

===
[   78.714569] Fallback order for Node 0: 0
[   78.714575] Built 1 zonelists, mobility grouping on.  Total pages: 1817886
[   78.717133] Policy zone: Normal
[   78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[   78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[   80.056643] PEFILE: Unsigned PE binary
===

The memory hotplug events are notified very quickly and very many,
while the handling of crash hotplug is much slower relatively. So the
atomic variable __kexec_lock and kexec_trylock() can't guarantee the
serialization of crash hotplug handling.

Here, add a new mutex lock __crash_hotplug_lock to serialize crash
hotplug handling specifically. This doesn't impact the usage of
__kexec_lock.

Signed-off-by: Baoquan He 


I've run this patch in my regression environment and I do not see any
lock failures! And I've done this with a variety of DIMM sizes up to 8GiB in
order to vary the "size of the swarm". Both with kexec_load and kexec_file_load.

Tested-by: Eric DeVolder 
Reviewed-by: Eric DeVolder 

---
v2->v3:
  - crash_check_update_elfcorehdr() need take __crash_hotplug_lock
too because there's tiny racing window when kexec_load interface
is taken. Eric pointed out this.
v1->v2:
  - Move mutex lock definition into CONFIG_CRASH_HOTPLUG ifdeffery
scope in kernel/crash_core.c because the lock is only needed and
used in that scope. Suggested by Eric.
  kernel/crash_core.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 03a7932cde0a..2f675ef045d4 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -739,6 +739,17 @@ subsys_initcall(crash_notes_memory_init);
  #undef pr_fmt
  #define pr_fmt(fmt) "crash hp: " fmt
  
+/*

+ * Different than kexec/kdump loading/unloading/jumping/shrinking which
+ * usually rarely happen, there will be many crash hotplug events notified
+ * during one short period, e.g one memory board is hot added and memory
+ * regions are online. So mutex lock  __crash_hotplug_lock is used to
+ * serialize the crash hotplug handling specifically.
+ */
+DEFINE_MUTEX(__crash_hotplug_lock);
+#define crash_hotplug_lock() mutex_lock(&__crash_hotplug_lock)
+#define crash_hotplug_unlock() mutex_unlock(&__crash_hotplug_lock)
+
  /*
   * This routine utilized when the crash_hotplug sysfs node is read.
   * It reflects the kernel's ability/permission to update the crash
@@ -748,9 +759,11 @@ int crash_check_update_elfcorehdr(void)
  {
int rc = 0;
  
+	crash_hotplug_lock();

/* Obtain lock while reading crash information */
if (!kexec_trylock()) {
pr_info("kexec_trylock() failed, elfcorehdr may be 
inaccurate\n");
+   crash_hotplug_unlock();
return 0;
}
if (kexec_crash_image) {
@@ -761,6 +774,7 @@ int crash_check_update_elfcorehdr(void)
}
/* Release lock now that update complete */
kexec_unlock();
+   crash_hotplug_unlock();
  
  	return rc;

  }
@@ -783,9 +797,11 @@ static void crash_handle_hotplug_event(unsigned int 
hp_action, unsigned int cpu)
  {
struct kimage *image;
  
+	crash_hotplug_lock();

/* Obtain lock while changing crash information */
if (!kexec_trylock()) {
pr_info("kexec_trylock() failed, elfcorehdr may be 
inaccurate\n");
+   crash_hotplug_unlock();
return;
}
  
@@ -852,6 +868,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)

  out:
/* Release lock now that update complete */
kexec_unlock();
+   crash_hotplug_unlock();
  }
  
  static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] Crash: add lock to serialize crash hotplug handling

2023-09-25 Thread Eric DeVolder




On 9/24/23 22:07, Baoquan He wrote:

Eric reported that handling corresponding crash hotplug event can be
failed easily when many memory hotplug event are notified in a short
period. They failed because failing to take __kexec_lock.

===
[   78.714569] Fallback order for Node 0: 0
[   78.714575] Built 1 zonelists, mobility grouping on.  Total pages: 1817886
[   78.717133] Policy zone: Normal
[   78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[   78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[   80.056643] PEFILE: Unsigned PE binary
===

The memory hotplug events are notified very quickly and very many,
while the handling of crash hotplug is much slower relatively. So the
atomic variable __kexec_lock and kexec_trylock() can't guarantee the
serialization of crash hotplug handling.

Here, add a new mutex lock __crash_hotplug_lock to serialize crash
hotplug handling specifically. This doesn't impact the usage of
__kexec_lock.

Signed-off-by: Baoquan He 
---
v1->v2:
  - Move mutex lock definition into CONFIG_CRASH_HOTPLUG ifdeffery
scope in kernel/crash_core.c because the lock is only needed and
used in that scope. Suggested by Eric.

  kernel/crash_core.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 03a7932cde0a..5951d6366b72 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -739,6 +739,17 @@ subsys_initcall(crash_notes_memory_init);
  #undef pr_fmt
  #define pr_fmt(fmt) "crash hp: " fmt
  
+/*

+ * Different than kexec/kdump loading/unloading/jumping/shrinking which
+ * usually rarely happen, there will be many crash hotplug events notified
+ * during one short period, e.g one memory board is hot added and memory
+ * regions are online. So mutex lock  __crash_hotplug_lock is used to
+ * serialize the crash hotplug handling specifically.
+ */
+DEFINE_MUTEX(__crash_hotplug_lock);
+#define crash_hotplug_lock() mutex_lock(&__crash_hotplug_lock)
+#define crash_hotplug_unlock() mutex_unlock(&__crash_hotplug_lock)
+
  /*
   * This routine utilized when the crash_hotplug sysfs node is read.
   * It reflects the kernel's ability/permission to update the crash
@@ -783,9 +794,11 @@ static void crash_handle_hotplug_event(unsigned int 
hp_action, unsigned int cpu)
  {
struct kimage *image;
  
+	crash_hotplug_lock();

/* Obtain lock while changing crash information */
if (!kexec_trylock()) {
pr_info("kexec_trylock() failed, elfcorehdr may be 
inaccurate\n");
+   crash_hotplug_unlock();
return;
}
  
@@ -852,6 +865,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)

  out:
/* Release lock now that update complete */
kexec_unlock();
+   crash_hotplug_unlock();
  }
  
  static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)


The crash_check_update_elfcorehdr() also has kexec_trylock() and needs similar 
treatment.
Userspace (ie udev rule processing) and kernel (crash hotplug infrastrucutre) 
need to be
protected/serialized from one another.

Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] Crash: add lock to serialize crash hotplug handling

2023-09-23 Thread Eric DeVolder




On 9/22/23 18:54, Baoquan He wrote:

Eric reported that handling corresponding crash hotplug event can be
failed easily when many momery hotplug event are notified in a short period.
They failed because failing to take __kexec_lock.

===
[   78.714569] Fallback order for Node 0: 0
[   78.714575] Built 1 zonelists, mobility grouping on.  Total pages: 1817886
[   78.717133] Policy zone: Normal
[   78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[   78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[   80.056643] PEFILE: Unsigned PE binary
===

The memory hotplug events are notified very quickly and very many,
while the handling of crash hotplug is much slower relatively. So the
atomic variable __kexec_lock and kexec_trylock() can't guarantee the
serialization of crash hotplug handling.

Here, add a new mutex lock __crash_hotplug_lock to serialize crash
hotplug handling specifically. This doesn't impact the usage of
__kexec_lock.

Signed-off-by: Baoquan He 
---
  kernel/crash_core.c |  3 +++
  kernel/kexec_core.c |  1 +
  kernel/kexec_internal.h | 11 +++
  3 files changed, 15 insertions(+)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 03a7932cde0a..e8851724a530 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -783,9 +783,11 @@ static void crash_handle_hotplug_event(unsigned int 
hp_action, unsigned int cpu)
  {
struct kimage *image;
  
+	crash_hotplug_lock();

/* Obtain lock while changing crash information */
if (!kexec_trylock()) {
pr_info("kexec_trylock() failed, elfcorehdr may be 
inaccurate\n");
+   crash_hotplug_unlock();
return;
}
  
@@ -852,6 +854,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)

  out:
/* Release lock now that update complete */
kexec_unlock();
+   crash_hotplug_unlock();
  }
  


The crash_check_update_elfcorehdr() also has kexec_trylock() and needs similar 
treatment.


  static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, 
void *v)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 9dc728982d79..b95a73f35d9a 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -48,6 +48,7 @@
  #include "kexec_internal.h"
  
  atomic_t __kexec_lock = ATOMIC_INIT(0);

+DEFINE_MUTEX(__crash_hotplug_lock);
  
  /* Flag to indicate we are going to kexec a new kernel */

  bool kexec_in_progress = false;
diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
index 74da1409cd14..1db31625ef20 100644
--- a/kernel/kexec_internal.h
+++ b/kernel/kexec_internal.h
@@ -28,6 +28,17 @@ static inline void kexec_unlock(void)
atomic_set_release(&__kexec_lock, 0);
  }
  
+/*

+ * Different than kexec/kdump loading/unloading/crash or kexec 
jumping/shrinking
+ * which usually rarely happen, there will be many crash hotplug events 
notified
+ * during one short period, e.g one memory board is hot added and memory 
regions
+ * are online. So mutex lock  __crash_hotplug_lock is used to serialize the 
crash
+ * hotplug handling specificially.
+ * */
+extern struct mutex __crash_hotplug_lock;
+#define crash_hotplug_lock() mutex_lock(&__crash_hotplug_lock)
+#define crash_hotplug_unlock() mutex_unlock(&__crash_hotplug_lock)
+
  #ifdef CONFIG_KEXEC_FILE
  #include 
  void kimage_file_post_load_cleanup(struct kimage *image);


The new content for kexec_internal.h and kexec_core.c could/should probably be
moved into crash_core.c, within the CONFIG_CRASH_HOTPLUG?

eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: change locking mechanism to a mutex

2023-09-22 Thread Eric DeVolder




On 9/22/23 11:28, Valentin Schneider wrote:

On 21/09/23 17:59, Eric DeVolder wrote:

The design decision to use the atomic lock is described in the comment
from kexec_internal.h, cited above. However, examining the code of
__crash_kexec():

 if (kexec_trylock()) {
 if (kexec_crash_image) {
 ...
 }
 kexec_unlock();
 }

reveals that the use of kexec_trylock() here is actually a "best effort"
due to the atomic lock.  This atomic lock, prior to crash hotplug,
would almost always be assured (another kexec syscall could hold the lock
and prevent this, but that is about it).

So at the point where the capture kernel would be invoked, if the lock
is not obtained, then kdump doesn't occur.

It is possible to instead use a mutex with proper waiting, and utilize
mutex_trylock() as the "best effort" in __crash_kexec(). The use of a
mutex then avoids all the lock acquisition problems that were revealed
by the crash hotplug activity.



@Dave thanks for the Cc, I'd have missed this otherwise.


Prior to the atomic thingie, we actually had a mutex and did
mutex_trylock() in __crash_kexec(). I'm a bit confused as this looks like a
revert of
   05c6257433b7 ("panic, kexec: make __crash_kexec() NMI safe")
with just the helpers kept in - this doesn't seem to address any of the
original issues regarding NMIs?

Sebastian raised some good points in [1] regarding these issues.
The main hurdle pointed out there is, if we end up in the slowpath during
the unlock, then we can can up acquiring the ->wait_lock which isn't NMI
safe.

This is even worse on PREEMPT_RT, as both trylock and the unlock can end up
acquiring the ->wait_lock.

[1]: https://lore.kernel.org/all/yqyz%2fuf14qkyt...@linutronix.de/


Having reviewed the references, it would seem that Baoquan's approach of a new
lock to handle the hotplug activity is the way to go?
Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: change locking mechanism to a mutex

2023-09-22 Thread Eric DeVolder




On 9/22/23 03:06, Baoquan He wrote:

On 09/22/23 at 11:36am, Dave Young wrote:

[Cced Valentin Schneider as he added the trylocks]

On Fri, 22 Sept 2023 at 06:04, Eric DeVolder  wrote:


Scaled up testing has revealed that the kexec_trylock()
implementation leads to failures within the crash hotplug
infrastructure due to the inability to acquire the lock,
specifically the message:

  crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate

When hotplug events occur, the crash hotplug infrastructure first
attempts to obtain the lock via the kexec_trylock(). However, the
implementation either acquires the lock, or fails and returns; there
is no waiting on the lock. Here is the comment/explanation from
kernel/kexec_internal.h:kexec_trylock():

  * Whatever is used to serialize accesses to the kexec_crash_image needs to be
  * NMI safe, as __crash_kexec() can happen during nmi_panic(), so here we use a
  * "simple" atomic variable that is acquired with a cmpxchg().

While this in theory can happen for either CPU or memory hoptlug,
this problem is most prone to occur for memory hotplug.

When memory is hot plugged, the memory is converted into smaller
128MiB memblocks (typically). As each memblock is processed, a
kernel thread and a udev event thread are created. The udev thread
tries for the lock via the reading of the sysfs node
/sys/devices/system/memory/crash_hotplug node, and the kernel
worker thread tries for the lock upon entering the crash hotplug
infrastructure.

These threads then compete for the kexec lock.

For example, a 1GiB DIMM is converted into 8 memblocks, each
spawning two threads for a total of 16 threads that create a small
"swarm" all trying to acquire the lock. The larger the DIMM, the
more the memblocks and the larger the swarm.

At the root of the problem is the atomic lock behind kexec_trylock();
it works well for low lock traffic; ie loading/unloading a capture
kernel, things that happen basically once. But with the introduction
of crash hotplug, the traffic through the lock increases significantly,
and more importantly in bursts occurring at roughly the same time. Thus
there is a need to wait on the lock.


Yeah, the atomic __kexec_lock is used to lock the door of operation on
kimage. Among kexec/kdump kernel load/unload/shrink/jumping, once any one
is in progress, the later attempt doesn't make sense. And these events are
rare.

Crash hotplug event is different, there will be many during one period.
The main problem you are encountering is the cocurrent handling of hotplug
event, right? Wondering if we can define another mutex lock to serialize
the handling of hotplug event like below. Just a sterotype to state my
thought.


I've tried this patch (with slight change) against my regression setup and it 
works as well.
Eric



diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 03a7932cde0a..39b9a57a4177 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -783,6 +783,7 @@ static void crash_handle_hotplug_event(unsigned int 
hp_action, unsigned int cpu)
  {
struct kimage *image;
  
+	crash_hotplug_lock();

/* Obtain lock while changing crash information */
if (!kexec_trylock()) {
pr_info("kexec_trylock() failed, elfcorehdr may be 
inaccurate\n");
@@ -852,6 +853,7 @@ static void crash_handle_hotplug_event(unsigned int 
hp_action, unsigned int cpu)
  out:
/* Release lock now that update complete */
kexec_unlock();
+   crash_hotplug_unlock();
  }
  
  static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 9dc728982d79..b95a73f35d9a 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -48,6 +48,7 @@
  #include "kexec_internal.h"
  
  atomic_t __kexec_lock = ATOMIC_INIT(0);

+DEFINE_MUTEX(__crash_hotplug_lock);
  
  /* Flag to indicate we are going to kexec a new kernel */

  bool kexec_in_progress = false;
diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
index 74da1409cd14..32cb890bb059 100644
--- a/kernel/kexec_internal.h
+++ b/kernel/kexec_internal.h
@@ -28,6 +28,10 @@ static inline void kexec_unlock(void)
atomic_set_release(&__kexec_lock, 0);
  }
  
+extern struct mutex __crash_hotplug_lock;

+#define crash_hotplug_lock() mutex_lock(&__crash_hotplug_lock)
+#define crash_hotplug_unlock() mutex_unlock(&__crash_hotplug_lock)
+
  #ifdef CONFIG_KEXEC_FILE
  #include 
  void kimage_file_post_load_cleanup(struct kimage *image);



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: change locking mechanism to a mutex

2023-09-21 Thread Eric DeVolder




On 9/21/23 19:26, Andrew Morton wrote:

On Thu, 21 Sep 2023 17:59:38 -0400 Eric DeVolder  
wrote:


Scaled up testing has revealed that the kexec_trylock()
implementation leads to failures within the crash hotplug
infrastructure due to the inability to acquire the lock,
specifically the message:

  crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate

When hotplug events occur, the crash hotplug infrastructure first
attempts to obtain the lock via the kexec_trylock(). However, the
implementation either acquires the lock, or fails and returns; there
is no waiting on the lock. Here is the comment/explanation from
kernel/kexec_internal.h:kexec_trylock():

  * Whatever is used to serialize accesses to the kexec_crash_image needs to be
  * NMI safe, as __crash_kexec() can happen during nmi_panic(), so here we use a
  * "simple" atomic variable that is acquired with a cmpxchg().

While this in theory can happen for either CPU or memory hoptlug,
this problem is most prone to occur for memory hotplug.

When memory is hot plugged, the memory is converted into smaller
128MiB memblocks (typically). As each memblock is processed, a
kernel thread and a udev event thread are created. The udev thread
tries for the lock via the reading of the sysfs node
/sys/devices/system/memory/crash_hotplug node, and the kernel
worker thread tries for the lock upon entering the crash hotplug
infrastructure.

These threads then compete for the kexec lock.

For example, a 1GiB DIMM is converted into 8 memblocks, each
spawning two threads for a total of 16 threads that create a small
"swarm" all trying to acquire the lock. The larger the DIMM, the
more the memblocks and the larger the swarm.

At the root of the problem is the atomic lock behind kexec_trylock();
it works well for low lock traffic; ie loading/unloading a capture
kernel, things that happen basically once. But with the introduction
of crash hotplug, the traffic through the lock increases significantly,
and more importantly in bursts occurring at roughly the same time. Thus
there is a need to wait on the lock.

A possible workaround is to simply retry the lock, say up to N times.
There is, of course, the problem of determining a value of N that works for
all implementations, and for all the other call sites of kexec_trylock().
Not ideal.

The design decision to use the atomic lock is described in the comment
from kexec_internal.h, cited above. However, examining the code of
__crash_kexec():

 if (kexec_trylock()) {
 if (kexec_crash_image) {
 ...
 }
 kexec_unlock();
 }

reveals that the use of kexec_trylock() here is actually a "best effort"
due to the atomic lock.  This atomic lock, prior to crash hotplug,
would almost always be assured (another kexec syscall could hold the lock
and prevent this, but that is about it).

So at the point where the capture kernel would be invoked, if the lock
is not obtained, then kdump doesn't occur.

It is possible to instead use a mutex with proper waiting, and utilize
mutex_trylock() as the "best effort" in __crash_kexec(). The use of a
mutex then avoids all the lock acquisition problems that were revealed
by the crash hotplug activity.

Convert the atomic lock to a mutex.

...

--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -47,7 +47,7 @@
  #include 
  #include "kexec_internal.h"
  
-atomic_t __kexec_lock = ATOMIC_INIT(0);

+DEFINE_MUTEX(__kexec_lock);
  
  /* Flag to indicate we are going to kexec a new kernel */

  bool kexec_in_progress = false;
@@ -1057,7 +1057,7 @@ void __noclone __crash_kexec(struct pt_regs *regs)
 * of memory the xchg(_crash_image) would be
 * sufficient.  But since I reuse the memory...
 */
-   if (kexec_trylock()) {
+   if (mutex_trylock(&__kexec_lock)) {
if (kexec_crash_image) {
struct pt_regs fixed_regs;


What's happening here?  If someone else held the lock we silently fail
to run the kexec?  Shouldn't we at least alert the user to what just
happened?



Yes, I believe it would silently "fail" and not run the kexec kernel.
I do not have a good feel to know if logging is going to be functional,
and reliable, at this point in time (on a panic path)...
eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: change locking mechanism to a mutex

2023-09-21 Thread Eric DeVolder




On 9/21/23 19:22, Andrew Morton wrote:

On Thu, 21 Sep 2023 17:59:38 -0400 Eric DeVolder  
wrote:


Scaled up testing has revealed that the kexec_trylock()
implementation leads to failures within the crash hotplug
infrastructure due to the inability to acquire the lock,
specifically the message:

...

Convert the atomic lock to a mutex.



Do you think this problem is serious enough to warrant a backport into
-stable kernels?


I do not since it will be the lock traffic created by the crash hotplug infrastructure that will 
reveal the weak locking mechanism. Until this crash hotplug shows up in a stable kernel, it should 
not be an issue; there isn't anything else that easily exercise it to reveal the problem.


eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec: change locking mechanism to a mutex

2023-09-21 Thread Eric DeVolder
Scaled up testing has revealed that the kexec_trylock()
implementation leads to failures within the crash hotplug
infrastructure due to the inability to acquire the lock,
specifically the message:

 crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate

When hotplug events occur, the crash hotplug infrastructure first
attempts to obtain the lock via the kexec_trylock(). However, the
implementation either acquires the lock, or fails and returns; there
is no waiting on the lock. Here is the comment/explanation from
kernel/kexec_internal.h:kexec_trylock():

 * Whatever is used to serialize accesses to the kexec_crash_image needs to be
 * NMI safe, as __crash_kexec() can happen during nmi_panic(), so here we use a
 * "simple" atomic variable that is acquired with a cmpxchg().

While this in theory can happen for either CPU or memory hoptlug,
this problem is most prone to occur for memory hotplug.

When memory is hot plugged, the memory is converted into smaller
128MiB memblocks (typically). As each memblock is processed, a
kernel thread and a udev event thread are created. The udev thread
tries for the lock via the reading of the sysfs node
/sys/devices/system/memory/crash_hotplug node, and the kernel
worker thread tries for the lock upon entering the crash hotplug
infrastructure.

These threads then compete for the kexec lock.

For example, a 1GiB DIMM is converted into 8 memblocks, each
spawning two threads for a total of 16 threads that create a small
"swarm" all trying to acquire the lock. The larger the DIMM, the
more the memblocks and the larger the swarm.

At the root of the problem is the atomic lock behind kexec_trylock();
it works well for low lock traffic; ie loading/unloading a capture
kernel, things that happen basically once. But with the introduction
of crash hotplug, the traffic through the lock increases significantly,
and more importantly in bursts occurring at roughly the same time. Thus
there is a need to wait on the lock.

A possible workaround is to simply retry the lock, say up to N times.
There is, of course, the problem of determining a value of N that works for
all implementations, and for all the other call sites of kexec_trylock().
Not ideal.

The design decision to use the atomic lock is described in the comment
from kexec_internal.h, cited above. However, examining the code of
__crash_kexec():

if (kexec_trylock()) {
if (kexec_crash_image) {
...
}
kexec_unlock();
}

reveals that the use of kexec_trylock() here is actually a "best effort"
due to the atomic lock.  This atomic lock, prior to crash hotplug,
would almost always be assured (another kexec syscall could hold the lock
and prevent this, but that is about it).

So at the point where the capture kernel would be invoked, if the lock
is not obtained, then kdump doesn't occur.

It is possible to instead use a mutex with proper waiting, and utilize
mutex_trylock() as the "best effort" in __crash_kexec(). The use of a
mutex then avoids all the lock acquisition problems that were revealed
by the crash hotplug activity.

Convert the atomic lock to a mutex.

Signed-off-by: Eric DeVolder 
---
 kernel/crash_core.c | 10 ++
 kernel/kexec.c  |  3 +--
 kernel/kexec_core.c | 13 +
 kernel/kexec_file.c |  3 +--
 kernel/kexec_internal.h | 12 +++-
 5 files changed, 12 insertions(+), 29 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 03a7932cde0a..9a8378fbdafa 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -749,10 +749,7 @@ int crash_check_update_elfcorehdr(void)
int rc = 0;
 
/* Obtain lock while reading crash information */
-   if (!kexec_trylock()) {
-   pr_info("kexec_trylock() failed, elfcorehdr may be 
inaccurate\n");
-   return 0;
-   }
+   kexec_lock();
if (kexec_crash_image) {
if (kexec_crash_image->file_mode)
rc = 1;
@@ -784,10 +781,7 @@ static void crash_handle_hotplug_event(unsigned int 
hp_action, unsigned int cpu)
struct kimage *image;
 
/* Obtain lock while changing crash information */
-   if (!kexec_trylock()) {
-   pr_info("kexec_trylock() failed, elfcorehdr may be 
inaccurate\n");
-   return;
-   }
+   kexec_lock();
 
/* Check kdump is not loaded */
if (!kexec_crash_image)
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 107f355eac10..a2f687900bb5 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -96,8 +96,7 @@ static int do_kexec_load(unsigned long entry, unsigned long 
nr_segments,
 * crash kernels we need a serialization here to prevent multiple crash
 * kernels from attempting to load simultaneously.
 */
-   if (!kexec_trylock())
-   return -EB

Re: [PATCH v28 0/8] crash: Kernel handling of CPU and memory hot un/plug

2023-08-20 Thread Eric DeVolder



On 8/14/23 17:33, Andrew Morton wrote:

On Mon, 14 Aug 2023 17:44:38 -0400 Eric DeVolder  
wrote:


This series is dependent upon "refactor Kconfig to consolidate
KEXEC and CRASH options".
  https://lore.kernel.org/lkml/20230712161545.87870-1-eric.devol...@oracle.com/

Once the kdump service is loaded, if changes to CPUs or memory occur,
either by hot un/plug or off/onlining, the crash elfcorehdr must also
be updated.


Thanks, I updated branch mm-nonmm-unstable to this version.


Andrew,
So far only one issue has popped up. I've posted the following patch to akpm to 
solve
that issue. Please apply this patch on-top/with this v28 series.

 [PATCH] x86/crash: correct unused function build error

The thread on this issue is here:
https://lore.kernel.org/lkml/08fc20ef-854d-404a-b2f2-75941eeeccf8@paulmck-laptop/

If you'd rather I post a v29, I'll happily do so.
Thank you!
eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v27 2/8] crash: add generic infrastructure for crash hotplug support

2023-08-15 Thread Eric DeVolder



On 8/12/23 05:47, Sourabh Jain wrote:

Hello Eric,

On 11/08/23 22:36, Eric DeVolder wrote:

To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining). The crash elfcorehdr describes the CPUs and memory to
be written into the vmcore.

To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE
group, just prior to the STARTING group, which is very close to the
CPU starting up in a plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate. Note that for a CPU being unplugged
or offlined, the CPU will still be present in the list of CPUs
generated by crash_prepare_elf64_headers(). However, there is no
need to explicitly omit the CPU, see justification in
'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'.

To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
  include/linux/crash_core.h |   9 +++
  include/linux/kexec.h  |  11 +++
  kernel/Kconfig.kexec   |  31 
  kernel/crash_core.c    | 142 +
  kernel/kexec_core.c    |   6 ++
  5 files changed, 199 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..e14345cc7a22 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long 
system_ram,
  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
  unsigned long long *crash_size, unsigned long long *crash_base);
+#define KEXEC_CRASH_HP_NONE    0
+#define KEXEC_CRASH_HP_ADD_CPU    1
+#define KEXEC_CRASH_HP_REMOVE_CPU    2
+#define KEXEC_CRASH_HP_ADD_MEMORY    3
+#define KEXEC_CRASH_HP_REMOVE_MEMORY    4
+#define KEXEC_CRASH_HP_INVALID_CPU    -1U
+
+struct kimage;
+
  #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 811a90e09698..b9903dd48e24 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
  #include 
  #include 
  #include 
+#include 
  #include 
  /* Verify architecture specific macros are defined */
@@ -360,6 +361,12 @@ struct kimage {
  struct purgatory_info purgatory_info;
  #endif
+#ifdef CONFIG_CRASH_HOTPLUG
+    int hp_action;
+    int elfcorehdr_index;
+    bool elfcorehdr_updated;
+#endif
+
  #ifdef CONFIG_IMA_KEXEC
  /* Virtual address of IMA measurement buffer for kexec syscall */
  void *ima_buffer;
@@ -490,6 +497,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, 
unsigned int pages, g
  static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) 
{ }
  #endif
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+


Isn't the above function should be declare under CONFIG_CRASH_HOTPLUG?

Thanks,
Sourabh


There are no compiler warnings/errors, due to the nature of being declared 
static inline.
And most of the other functions defined in a similar way in this file are not 
guard banded
by CONFIG ifdefs. I'm inclined to leave it this way.
Thanks!
eric


  #else /* !CONFIG_KEXEC_CORE */
  struct pt_regs;
  struct task_struct;
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index ff72e45cfaef..d0a9a5392035 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -113,4 +113,35 @@ config CRASH_DUMP
    For s390, this option also enables zfcpdump.
    See also 
+config CRASH_HOTPLUG
+    bool "Update the crash elfcorehdr on system configuration changes"
+    default y
+    depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+    depends on ARCH_SUPPORTS_CRASH_HOTPLUG
+    help
+  Enable direct update to the crash elfcorehdr (which contains
+  the list of CPUs and memory regions to be dumped upon a crash)
+  in response to hot plu

[PATCH v28 6/8] crash: hotplug support for kexec_load()

2023-08-14 Thread Eric DeVolder
The hotplug support for kexec_load() requires changes to the
userspace kexec-tools and a little extra help from the kernel.

Given a kdump capture kernel loaded via kexec_load(), and a
subsequent hotplug event, the crash hotplug handler finds the
elfcorehdr and rewrites it to reflect the hotplug change.
That is the desired outcome, however, at kernel panic time,
the purgatory integrity check fails (because the elfcorehdr
changed), and the capture kernel does not boot and no vmcore
is generated.

Therefore, the userspace kexec-tools/kexec must indicate to the
kernel that the elfcorehdr can be modified (because the kexec
excluded the elfcorehdr from the digest, and sized the elfcorehdr
memory buffer appropriately).

To facilitate hotplug support with kexec_load():
 - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is
   safe for the kernel to modify the kexec_load()'d elfcorehdr
 - the /sys/kernel/crash_elfcorehdr_size node communicates the
   preferred size of the elfcorehdr memory buffer
 - The sysfs crash_hotplug nodes (ie.
   /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically
   take into account kexec_file_load() vs kexec_load() and
   KEXEC_UPDATE_ELFCOREHDR.
   This is critical so that the udev rule processing of crash_hotplug
   is all that is needed to determine if the userspace unload-then-load
   of the kdump image is to be skipped, or not. The proposed udev
   rule change looks like:
   # The kernel updates the crash elfcorehdr for CPU and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

The table below indicates the behavior of kexec_load()'d kdump image
updates (with the new udev crash_hotplug rule in place):

 Kernel |Kexec
 ---+-+
 Old|Old  |New
|  a  | a
 ---+-+
 New|  a  | b
 ---+-+

where kexec 'old' and 'new' delineate kexec-tools has the needed
modifications for the crash hotplug feature, and kernel 'old' and
'new' delineate the kernel supports this crash hotplug feature.

Behavior 'a' indicates the unload-then-reload of the entire kdump
image. For the kexec 'old' column, the unload-then-reload occurs
due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel
(with 'new' kexec) does not present the crash_hotplug sysfs node,
which leads to the unload-then-reload of the kdump image.

Behavior 'b' indicates the desired optimized behavior of the kernel
directly modifying the elfcorehdr and avoiding the unload-then-reload
of the kdump image.

If the udev rule is not updated with crash_hotplug node check, then
no matter any combination of kernel or kexec is new or old, the
kdump image continues to be unload-then-reload on hotplug changes.

To fully support crash hotplug feature, there needs to be a rollout
of kernel, kexec-tools and udev rule changes. However, the order of
the rollout of these pieces does not matter; kexec_load()'d kdump
images still function for hotplug as-is.

Suggested-by: Hari Bathini 
Signed-off-by: Eric DeVolder 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/include/asm/kexec.h | 11 +++
 arch/x86/kernel/crash.c  | 27 +++
 include/linux/kexec.h| 14 --
 include/uapi/linux/kexec.h   |  1 +
 kernel/Kconfig.kexec |  4 
 kernel/crash_core.c  | 31 +++
 kernel/kexec.c   |  5 +
 kernel/ksysfs.c  | 15 +++
 8 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage 
*image);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(void);
+#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static inline int crash_hotplug_memory_support(void) { return 1; }
-#define crash_hotplug_memory_support crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(void);
+#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
 #endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 4b6cebceec68..1900efcdf1bc 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -429,6 +429,33 @@ int crash_load_segments(struct kimage *image)
 #undef pr_fmt
 #define pr_fmt(fmt) "cras

[PATCH v28 5/8] x86/crash: add x86 crash hotplug support

2023-08-14 Thread Eric DeVolder
When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

A new elfcorehdr is generated from the available CPUs and memory
and replaces the existing elfcorehdr. The segment containing the
elfcorehdr is identified at run-time in
crash_core:crash_handle_hotplug_event().

No modifications to purgatory (see 'kexec: exclude elfcorehdr
from the segment digest') or boot_params (as the elfcorehdr=
capture kernel command line parameter pointer remains unchanged
and correct) are needed, just elfcorehdr.

For kexec_file_load(), the elfcorehdr segment size is based on
NR_CPUS and CRASH_MAX_MEMORY_RANGES in order to accommodate a
growing number of CPU and memory resources.

For kexec_load(), the userspace kexec utility needs to size the
elfcorehdr segment in the same/similar manner.

To accommodate kexec_load() syscall in the absence of
kexec_file_load() syscall support, prepare_elf_headers() and
dependents are moved outside of CONFIG_KEXEC_FILE.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/Kconfig |   3 +
 arch/x86/include/asm/kexec.h |  15 +
 arch/x86/kernel/crash.c  | 105 ---
 3 files changed, 116 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7082fc10b346..ffc95c3d6abd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2069,6 +2069,9 @@ config ARCH_SUPPORTS_KEXEC_JUMP
 config ARCH_SUPPORTS_CRASH_DUMP
def_bool X86_64 || (X86_32 && HIGHMEM)
 
+config ARCH_SUPPORTS_CRASH_HOTPLUG
+   def_bool y
+
 config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || 
CRASH_DUMP)
default "0x100"
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5b77bbc28f96..9143100ea3ea 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -209,6 +209,21 @@ typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..4b6cebceec68 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -158,8 +158,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
crash_save_cpu(regs, safe_smp_processor_id());
 }
 
-#ifdef CONFIG_KEXEC_FILE
-
+#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP)
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -231,7 +230,7 @@ static int prepare_elf64_ram_headers_callback(struct 
resource *res, void *arg)
 
 /* Prepare elf headers. Return addr and size */
 static int prepare_elf_headers(struct kimage *image, void **addr,
-   unsigned long *sz)
+   unsigned long *sz, unsigned long 
*nr_mem_ranges)
 {
struct crash_mem *cmem;
int ret;
@@ -249,6 +248,9 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
if (ret)
goto out;
 
+   /* Return the computed number of memory ranges, for hotplug usage */
+   *nr_mem_ranges = cmem->nr_ranges;
+
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), 
addr, sz);
 
@@ -256,7 +258,9 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
vfree(cmem);
return ret;
 }
+#endif
 
+#ifdef CONFIG_KEXEC_FILE
 static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
 {
unsigned int nr_e820_entries;
@@ -371,18 +375,42 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
 int crash_load_segments(struct kimage *image)
 {
int ret;
+   unsigned long pnum = 0;
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
  .buf_max = ULONG_MAX, .top_down = false };
 
/* Prepare elf headers and add a segment */
-   ret = prepare_elf_headers(image, , );
+   ret = prepare_elf_headers(image, , , );
if (ret)
return ret;
 
-   image->elf_headers = kbuf.buffer;
-   image->elf_headers_sz = k

[PATCH v28 0/8] crash: Kernel handling of CPU and memory hot un/plug

2023-08-14 Thread Eric DeVolder
.
 - Per David Hansen, converted to use of kmap_local_page().
 - Per Baoquan He, replaced use of __weak with the kexec technique.

v9: 13jun2022
 https://lkml.org/lkml/2022/6/13/3382
 https://lore.kernel.org/lkml/20220613224240.79400-1-eric.devol...@oracle.com/
 - Rebased to 5.18.0
 - Per Sourabh, moved crash_prepare_elf64_headers() into common
   crash_core.c to avoid compile issues with kexec_load only path.
 - Per David Hildebrand, replaced mutex_trylock() with mutex_lock().
 - Changed the __weak arch_crash_handle_hotplug_event() to utilize
   WARN_ONCE() instead of WARN(). Fix some formatting issues.
 - Per Sourabh, introduced sysfs attribute crash_hotplug for memory
   and CPUs; for use by userspace (udev) to determine if the kernel
   performs crash hot un/plug support.
 - Per Sourabh, moved the code detecting the elfcorehdr segment from
   arch/x86 into crash_core:handle_hotplug_event() so both kexec_load
   and kexec_file_load can benefit.
 - Updated userspace kexec-tools kexec utility to reflect change to
   using CRASH_MAX_MEMORY_RANGES and get_nr_cpus().
 - Updated the new proposed udev rules to reflect using the sysfs
   attributes crash_hotplug.

v8: 5may2022
 https://lkml.org/lkml/2022/5/5/1133
 https://lore.kernel.org/lkml/20220505184603.1548-1-eric.devol...@oracle.com/
 - Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
   of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
   is not needed. Also use of IS_ENABLED() rather than #ifdef's.
   Renamed crash_hotplug_handler() to handle_hotplug_event().
   And other corrections.
 - Per Baoquan, minimized the parameters to the arch_crash_
   handle_hotplug_event() to hp_action and cpu.
 - Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
 - Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
   to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
   by David Hildebrand. Folded this patch into the x86
   kexec_file_load support patch.

v7: 13apr2022
 https://lkml.org/lkml/2022/4/13/850
 https://lore.kernel.org/lkml/20220413164237.20845-1-eric.devol...@oracle.com/
 - Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
 https://lkml.org/lkml/2022/4/1/1203
 https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devol...@oracle.com/
 - Reword commit messages and some comment cleanup per Baoquan.
 - Changed elf_index to elfcorehdr_index for clarity.
 - Minor code changes per Baoquan.

v5: 3mar2022
 https://lkml.org/lkml/2022/3/3/674
 https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/
 - Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
   David Hildenbrand.
 - Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
 https://lkml.org/lkml/2022/2/9/1406
 https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/
 - Refactored patches per Baoquan suggestsions.
 - A few corrections, per Baoquan.

v3: 10jan2022
 https://lkml.org/lkml/2022/1/10/1212
 https://lore.kernel.org/lkml/20220110195727.1682-1-eric.devol...@oracle.com/
 - Rebasing per Baoquan He request.
 - Changed memory notifier per David Hildenbrand.
 - Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
 https://lkml.org/lkml/2021/12/7/1088
 https://lore.kernel.org/lkml/20211207195204.1582-1-eric.devol...@oracle.com/
 - Acting upon Baoquan He suggestion of removing elfcorehdr from
   the purgatory list of segments, removed purgatory code from
   patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
 https://lkml.org/lkml/2021/11/18/845
 https://lore.kernel.org/lkml/2028174948.37435-1-eric.devol...@oracle.com/
 - working patchset demonstrating kernel handling of hotplug
   updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
 https://lkml.org/lkml/2020/12/14/532
 https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d90...@oracle.com/
 NOTE: s/vmcoreinfo/elfcorehdr/g
 - proposed concept of allowing kernel to handle hotplug update
   of elfcorehdr
---


Eric DeVolder (8):
  crash: move a few code bits to setup support of crash hotplug
  crash: add generic infrastructure for crash hotplug support
  kexec: exclude elfcorehdr from the segment digest
  crash: memory and CPU hotplug sysfs attributes
  x86/crash: add x86 crash hotplug support
  crash: hotplug support for kexec_load()
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  x86/crash: optimize CPU changes

 .../ABI/testing/sysfs-devices-memory  |   8 +
 .../ABI/testing/sysfs-devices-system-cpu  |   8 +
 .../admin-guide/mm/memory-hotplug.rst |   8 +
 Documentation/core-api/cpu_hotplug.rst|  18 +
 arch/x86/Kconfig  |   3 +
 arch/x86/include/asm/kexec.h  |  18 +
 arch/x86/kernel/crash.c   | 142 ++-
 drivers/base/cpu.c|  13 +
 drivers/base/memory.c  

[PATCH v28 1/8] crash: move a few code bits to setup support of crash hotplug

2023-08-14 Thread Eric DeVolder
The crash hotplug support leans on the work for the kexec_file_load()
syscall. To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

In addition, struct crash_mem and crash_notes were moved to new
locales so that PROC_KCORE, which sets CRASH_CORE alone, builds
correctly.

No functionality change intended.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/crash_core.h |  20 
 include/linux/kexec.h  |  15 ---
 kernel/crash_core.c| 218 +
 kernel/kexec_core.c|  37 ---
 kernel/kexec_file.c| 181 --
 5 files changed, 238 insertions(+), 233 deletions(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..1e48b1d96404 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -28,6 +28,8 @@
 VMCOREINFO_BYTES)
 
 typedef u32 note_buf_t[CRASH_CORE_NOTE_BYTES/4];
+/* Per cpu memory for storing cpu states in case of system crash. */
+extern note_buf_t __percpu *crash_notes;
 
 void crash_update_vmcoreinfo_safecopy(void *ptr);
 void crash_save_vmcoreinfo(void);
@@ -84,4 +86,22 @@ int parse_crashkernel_high(char *cmdline, unsigned long long 
system_ram,
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
 
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN   4096
+
+struct crash_mem {
+   unsigned int max_nr_ranges;
+   unsigned int nr_ranges;
+   struct range ranges[];
+};
+
+extern int crash_exclude_mem_range(struct crash_mem *mem,
+  unsigned long long mstart,
+  unsigned long long mend);
+extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
+  void **addr, unsigned long *sz);
+
+struct kimage;
+struct kexec_segment;
+
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 22b5cd24f581..fb4350db33ff 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -230,21 +230,6 @@ static inline int arch_kexec_locate_mem_hole(struct 
kexec_buf *kbuf)
 }
 #endif
 
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN   4096
-
-struct crash_mem {
-   unsigned int max_nr_ranges;
-   unsigned int nr_ranges;
-   struct range ranges[];
-};
-
-extern int crash_exclude_mem_range(struct crash_mem *mem,
-  unsigned long long mstart,
-  unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
-  void **addr, unsigned long *sz);
-
 #ifndef arch_kexec_apply_relocations_add
 /*
  * arch_kexec_apply_relocations_add - apply relocations of type RELA
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..336083fba623 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -18,6 +19,9 @@
 
 #include "kallsyms_internal.h"
 
+/* Per cpu memory for storing cpu states in case of system crash. */
+note_buf_t __percpu *crash_notes;
+
 /* vmcoreinfo stuff */
 unsigned char *vmcoreinfo_data;
 size_t vmcoreinfo_size;
@@ -314,6 +318,187 @@ static int __init parse_crashkernel_dummy(char *arg)
 }
 early_param("crashkernel", parse_crashkernel_dummy);
 
+int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+ void **addr, unsigned long *sz)
+{
+   Elf64_Ehdr *ehdr;
+   Elf64_Phdr *phdr;
+   unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+   unsigned char *buf;
+   unsigned int cpu, i;
+   unsigned long long notes_addr;
+   unsigned long mstart, mend;
+
+   /* extra phdr for vmcoreinfo ELF note */
+   nr_phdr = nr_cpus + 1;
+   nr_phdr += mem->nr_ranges;
+
+   /*
+* kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+* area (for example, 8000 - a000 on x86_64).
+* I think this is required by tools like gdb. So same physical
+* memory will be mapped in two ELF headers. One will contain kernel
+* text virtual addresses and other will have __va(physical) addresses.
+*/
+
+   nr_phdr++;
+   elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+   elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+   buf = vzalloc(elf_sz);
+   if (!buf)
+   return -ENOMEM;
+
+   ehdr = (Elf64_Ehdr *)buf;
+  

[PATCH v28 8/8] x86/crash: optimize CPU changes

2023-08-14 Thread Eric DeVolder
crash_prepare_elf64_headers() writes into the elfcorehdr an ELF
PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs
(ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr.

The kimage->file_mode term covers kdump images loaded via the
kexec_file_load() syscall. Since crash_prepare_elf64_headers()
wrote the initial elfcorehdr, no update to the elfcorehdr is
needed for CPU changes.

The kimage->elfcorehdr_updated term covers kdump images loaded via
the kexec_load() syscall. At least one memory or CPU change must occur
to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr.
Afterwards, no update to the elfcorehdr is needed for CPU changes.

This code is intentionally *NOT* hoisted into
crash_handle_hotplug_event() as it would prevent the arch-specific
handler from running for CPU changes. This would break PPC, for
example, which needs to update other information besides the
elfcorehdr, on CPU changes.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/kernel/crash.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 1900efcdf1bc..86d2ca80b9b2 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -469,6 +469,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
unsigned long mem, memsz;
unsigned long elfsz = 0;
 
+   /*
+* As crash_prepare_elf64_headers() has already described all
+* possible CPUs, there is no need to update the elfcorehdr
+* for additional CPU changes.
+*/
+   if ((image->file_mode || image->elfcorehdr_updated) &&
+   ((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+   (image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+   return;
+
/*
 * Create the new elfcorehdr reflecting the changes to CPU and/or
 * memory resources.
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v28 3/8] kexec: exclude elfcorehdr from the segment digest

2023-08-14 Thread Eric DeVolder
When a crash kernel is loaded via the kexec_file_load() syscall, the
kernel places the various segments (ie crash kernel, crash initrd,
boot_params, elfcorehdr, purgatory, etc) in memory. For those
architectures that utilize purgatory, a hash digest of the segments
is calculated for integrity checking. The digest is embedded into
the purgatory image prior to placing in memory.

Updates to the elfcorehdr in response to CPU and memory changes
would cause the purgatory integrity checking to fail (at crash time,
and no vmcore created). Therefore, the elfcorehdr segment is
explicitly excluded from the purgatory digest, enabling updates to
the elfcorehdr while also avoiding the need to recompute the hash
digest and reload purgatory.

Suggested-by: Baoquan He 
Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/kexec_file.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 453b7a513540..e2ec9d7b9a1f 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -726,6 +726,12 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
for (j = i = 0; i < image->nr_segments; i++) {
struct kexec_segment *ksegment;
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   /* Exclude elfcorehdr segment to allow future changes via 
hotplug */
+   if (j == image->elfcorehdr_index)
+   continue;
+#endif
+
ksegment = >segment[i];
/*
 * Skip purgatory as it will be modified once we put digest
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v28 4/8] crash: memory and CPU hotplug sysfs attributes

2023-08-14 Thread Eric DeVolder
Introduce the crash_hotplug attribute for memory and CPUs for
use by userspace.  These attributes directly facilitate the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/memory/memory81
  looking at device '/devices/system/memory/memory81':
KERNEL=="memory81"
SUBSYSTEM=="memory"
DRIVER==""
ATTR{online}=="1"
ATTR{phys_device}=="0"
ATTR{phys_index}=="0051"
ATTR{removable}=="1"
ATTR{state}=="online"
ATTR{valid_zones}=="Movable"

  looking at parent device '/devices/system/memory':
KERNELS=="memory"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{auto_online_blocks}=="offline"
ATTRS{block_size_bytes}=="800"
ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
  looking at device '/devices/system/cpu/cpu0':
KERNEL=="cpu0"
SUBSYSTEM=="cpu"
DRIVER=="processor"
ATTR{crash_notes}=="277c38600"
ATTR{crash_notes_size}=="368"
ATTR{online}=="1"

  looking at parent device '/devices/system/cpu':
KERNELS=="cpu"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{crash_hotplug}=="1"
ATTRS{isolated}==""
ATTRS{kernel_max}=="8191"
ATTRS{nohz_full}=="  (null)"
ATTRS{offline}=="4-7"
ATTRS{online}=="0-3"
ATTRS{possible}=="0-7"
ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

 # The kernel updates the crash elfcorehdr for CPU and memory changes
 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
 SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules
test if crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule skips
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 Documentation/ABI/testing/sysfs-devices-memory |  8 
 .../ABI/testing/sysfs-devices-system-cpu   |  8 
 .../admin-guide/mm/memory-hotplug.rst  |  8 
 Documentation/core-api/cpu_hotplug.rst | 18 ++
 drivers/base/cpu.c | 13 +
 drivers/base/memory.c  | 13 +
 include/linux/kexec.h  |  8 
 7 files changed, 76 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-memory 
b/Documentation/ABI/testing/sysfs-devices-memory
index d8b0f80b9e33..a95e0f17c35a 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -110,3 +110,11 @@ Description:
link is created for memory section 9 on node0.
 
/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
+
+What:  /sys/devices/system/memory/crash_hotplug
+Date:  Aug 2023
+Contact:   Linux kernel mailing list 
+Description:
+   (RO) indicates whether or not the kernel directly supports
+   modifying the crash elfcorehdr for memory hot un/plug and/or
+   on/offline changes.
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 77942eedf4f6..b52564de2b18 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -687,3 +687,11 @@ Description:
(RO) the list of CPUs that are isolated

[PATCH v28 7/8] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()

2023-08-14 Thread Eric DeVolder
The function crash_prepare_elf64_headers() generates the elfcorehdr
which describes the CPUs and memory in the system for the crash kernel.
In particular, it writes out ELF PT_NOTEs for memory regions and the
CPUs in the system.

With respect to the CPUs, the current implementation utilizes
for_each_present_cpu() which means that as CPUs are added and removed,
the elfcorehdr must again be updated to reflect the new set of CPUs.

The reasoning behind the move to use for_each_possible_cpu(), is:

- At kernel boot time, all percpu crash_notes are allocated for all
  possible CPUs; that is, crash_notes are not allocated dynamically
  when CPUs are plugged/unplugged. Thus the crash_notes for each
  possible CPU are always available.

- The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU.
  Changing to for_each_possible_cpu() is valid as the crash_notes
  pointed to by each CPU PT_NOTE are present and always valid.

Furthermore, examining a common crash processing path of:

 kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
   elfcorehdr  /proc/vmcore vmcore

reveals how the ELF CPU PT_NOTEs are utilized:

- Upon panic, each CPU is sent an IPI and shuts itself down, recording
 its state in its crash_notes. When all CPUs are shutdown, the
 crash kernel is launched with a pointer to the elfcorehdr.

- The crash kernel via linux/fs/proc/vmcore.c does not examine or
 use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.

- The makedumpfile utility uses /proc/vmcore and reads the CPU
 PT_NOTEs to craft a nr_cpus variable, which is reported in a
 header but otherwise generally unused. Makedumpfile creates the
 vmcore.

- The 'crash' dump analyzer does not appear to reference the CPU
 PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
 symbols and directly examines those structure contents from vmcore
 memory. From that information it is able to determine which CPUs
 are present and online, and locate the corresponding crash_notes.
 Said differently, it appears that 'crash' analyzer does not rely
 on the ELF PT_NOTEs for CPUs; rather it obtains the information
 directly via kernel symbols and the memory within the vmcore.

(There maybe other vmcore generating and analysis tools that do use
these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
common solution.)

This results in the benefit of having all CPUs described in the
elfcorehdr, and therefore reducing the need to re-generate the
elfcorehdr on CPU changes, at the small expense of an additional
56 bytes per PT_NOTE for not-present-but-possible CPUs.

On systems where kexec_file_load() syscall is utilized, all the above
is valid. On systems where kexec_load() syscall is utilized, there
may be the need for the elfcorehdr to be regenerated once. The reason
being that some archs only populate the 'present' CPUs from the
/sys/devices/system/cpus entries, which the userspace 'kexec' utility
uses to generate the userspace-supplied elfcorehdr. In this situation,
one memory or CPU change will rewrite the elfcorehdr via the
crash_prepare_elf64_headers() function and now all possible CPUs will
be described, just as with kexec_file_load() syscall.

Suggested-by: Sourabh Jain 
Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/crash_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 34dc7bddfd77..7b87db9973a5 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -367,8 +367,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
ehdr->e_ehsize = sizeof(Elf64_Ehdr);
ehdr->e_phentsize = sizeof(Elf64_Phdr);
 
-   /* Prepare one phdr of type PT_NOTE for each present CPU */
-   for_each_present_cpu(cpu) {
+   /* Prepare one phdr of type PT_NOTE for each possible CPU */
+   for_each_possible_cpu(cpu) {
phdr->p_type = PT_NOTE;
notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
phdr->p_offset = phdr->p_paddr = notes_addr;
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v28 2/8] crash: add generic infrastructure for crash hotplug support

2023-08-14 Thread Eric DeVolder
To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining). The crash elfcorehdr describes the CPUs and memory to
be written into the vmcore.

To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE
group, just prior to the STARTING group, which is very close to the
CPU starting up in a plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate. Note that for a CPU being unplugged
or offlined, the CPU will still be present in the list of CPUs
generated by crash_prepare_elf64_headers(). However, there is no
need to explicitly omit the CPU, see justification in
'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'.

To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/crash_core.h |   7 ++
 include/linux/kexec.h  |  11 +++
 kernel/Kconfig.kexec   |  31 
 kernel/crash_core.c| 142 +
 kernel/kexec_core.c|   6 ++
 5 files changed, 197 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 1e48b1d96404..0c06561bf5ff 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -104,4 +104,11 @@ extern int crash_prepare_elf64_headers(struct crash_mem 
*mem, int need_kernel_ma
 struct kimage;
 struct kexec_segment;
 
+#define KEXEC_CRASH_HP_NONE0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_CPU  2
+#define KEXEC_CRASH_HP_ADD_MEMORY  3
+#define KEXEC_CRASH_HP_REMOVE_MEMORY   4
+#define KEXEC_CRASH_HP_INVALID_CPU -1U
+
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index fb4350db33ff..df395f888915 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Verify architecture specific macros are defined */
@@ -345,6 +346,12 @@ struct kimage {
struct purgatory_info purgatory_info;
 #endif
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   int hp_action;
+   int elfcorehdr_index;
+   bool elfcorehdr_updated;
+#endif
+
 #ifdef CONFIG_IMA_KEXEC
/* Virtual address of IMA measurement buffer for kexec syscall */
void *ima_buffer;
@@ -475,6 +482,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, 
unsigned int pages, g
 static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) 
{ }
 #endif
 
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index ff72e45cfaef..d0a9a5392035 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -113,4 +113,35 @@ config CRASH_DUMP
  For s390, this option also enables zfcpdump.
  See also 
 
+config CRASH_HOTPLUG
+   bool "Update the crash elfcorehdr on system configuration changes"
+   default y
+   depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+   depends on ARCH_SUPPORTS_CRASH_HOTPLUG
+   help
+ Enable direct update to the crash elfcorehdr (which contains
+ the list of CPUs and memory regions to be dumped upon a crash)
+ in response to hot plug/unplug or online/offline of CPUs or
+ memory. This is a much more advanced approach than userspace
+ attempting that.
+
+ If unsure, say Y.
+
+config CRASH_MAX_MEMORY_RANGES
+   int "Specify the maximum number of memory regions for the elfcorehdr"
+   default 8192
+   depends on CRASH_HOTPLUG
+   help
+ For the kexec_file_load() syscall path, specify the maximum number of
+ memory regions that the elfcorehdr bu

[PATCH v27 2/8] crash: add generic infrastructure for crash hotplug support

2023-08-11 Thread Eric DeVolder
To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining). The crash elfcorehdr describes the CPUs and memory to
be written into the vmcore.

To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE
group, just prior to the STARTING group, which is very close to the
CPU starting up in a plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate. Note that for a CPU being unplugged
or offlined, the CPU will still be present in the list of CPUs
generated by crash_prepare_elf64_headers(). However, there is no
need to explicitly omit the CPU, see justification in
'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'.

To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/crash_core.h |   9 +++
 include/linux/kexec.h  |  11 +++
 kernel/Kconfig.kexec   |  31 
 kernel/crash_core.c| 142 +
 kernel/kexec_core.c|   6 ++
 5 files changed, 199 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..e14345cc7a22 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long 
system_ram,
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
 
+#define KEXEC_CRASH_HP_NONE0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_CPU  2
+#define KEXEC_CRASH_HP_ADD_MEMORY  3
+#define KEXEC_CRASH_HP_REMOVE_MEMORY   4
+#define KEXEC_CRASH_HP_INVALID_CPU -1U
+
+struct kimage;
+
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 811a90e09698..b9903dd48e24 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Verify architecture specific macros are defined */
@@ -360,6 +361,12 @@ struct kimage {
struct purgatory_info purgatory_info;
 #endif
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   int hp_action;
+   int elfcorehdr_index;
+   bool elfcorehdr_updated;
+#endif
+
 #ifdef CONFIG_IMA_KEXEC
/* Virtual address of IMA measurement buffer for kexec syscall */
void *ima_buffer;
@@ -490,6 +497,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, 
unsigned int pages, g
 static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) 
{ }
 #endif
 
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index ff72e45cfaef..d0a9a5392035 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -113,4 +113,35 @@ config CRASH_DUMP
  For s390, this option also enables zfcpdump.
  See also 
 
+config CRASH_HOTPLUG
+   bool "Update the crash elfcorehdr on system configuration changes"
+   default y
+   depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+   depends on ARCH_SUPPORTS_CRASH_HOTPLUG
+   help
+ Enable direct update to the crash elfcorehdr (which contains
+ the list of CPUs and memory regions to be dumped upon a crash)
+ in response to hot plug/unplug or online/offline of CPUs or
+ memory. This is a much more advanced approach than userspace
+ attempting that.
+
+ If unsure, say Y.
+
+config CRASH_MAX_MEMORY_RANGES
+   int "Specify the maximum number of memory regions for the elfcorehdr"
+   default 8192
+   depends on CRASH_HOTPLUG
+   help
+

[PATCH v27 0/8] crash: Kernel handling of CPU and memory hot un/plug

2023-08-11 Thread Eric DeVolder
382
 https://lore.kernel.org/lkml/20220613224240.79400-1-eric.devol...@oracle.com/
 - Rebased to 5.18.0
 - Per Sourabh, moved crash_prepare_elf64_headers() into common
   crash_core.c to avoid compile issues with kexec_load only path.
 - Per David Hildebrand, replaced mutex_trylock() with mutex_lock().
 - Changed the __weak arch_crash_handle_hotplug_event() to utilize
   WARN_ONCE() instead of WARN(). Fix some formatting issues.
 - Per Sourabh, introduced sysfs attribute crash_hotplug for memory
   and CPUs; for use by userspace (udev) to determine if the kernel
   performs crash hot un/plug support.
 - Per Sourabh, moved the code detecting the elfcorehdr segment from
   arch/x86 into crash_core:handle_hotplug_event() so both kexec_load
   and kexec_file_load can benefit.
 - Updated userspace kexec-tools kexec utility to reflect change to
   using CRASH_MAX_MEMORY_RANGES and get_nr_cpus().
 - Updated the new proposed udev rules to reflect using the sysfs
   attributes crash_hotplug.

v8: 5may2022
 https://lkml.org/lkml/2022/5/5/1133
 https://lore.kernel.org/lkml/20220505184603.1548-1-eric.devol...@oracle.com/
 - Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
   of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
   is not needed. Also use of IS_ENABLED() rather than #ifdef's.
   Renamed crash_hotplug_handler() to handle_hotplug_event().
   And other corrections.
 - Per Baoquan, minimized the parameters to the arch_crash_
   handle_hotplug_event() to hp_action and cpu.
 - Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
 - Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
   to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
   by David Hildebrand. Folded this patch into the x86
   kexec_file_load support patch.

v7: 13apr2022
 https://lkml.org/lkml/2022/4/13/850
 https://lore.kernel.org/lkml/20220413164237.20845-1-eric.devol...@oracle.com/
 - Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
 https://lkml.org/lkml/2022/4/1/1203
 https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devol...@oracle.com/
 - Reword commit messages and some comment cleanup per Baoquan.
 - Changed elf_index to elfcorehdr_index for clarity.
 - Minor code changes per Baoquan.

v5: 3mar2022
 https://lkml.org/lkml/2022/3/3/674
 https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/
 - Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
   David Hildenbrand.
 - Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
 https://lkml.org/lkml/2022/2/9/1406
 https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/
 - Refactored patches per Baoquan suggestsions.
 - A few corrections, per Baoquan.

v3: 10jan2022
 https://lkml.org/lkml/2022/1/10/1212
 https://lore.kernel.org/lkml/20220110195727.1682-1-eric.devol...@oracle.com/
 - Rebasing per Baoquan He request.
 - Changed memory notifier per David Hildenbrand.
 - Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
 https://lkml.org/lkml/2021/12/7/1088
 https://lore.kernel.org/lkml/20211207195204.1582-1-eric.devol...@oracle.com/
 - Acting upon Baoquan He suggestion of removing elfcorehdr from
   the purgatory list of segments, removed purgatory code from
   patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
 https://lkml.org/lkml/2021/11/18/845
 https://lore.kernel.org/lkml/2028174948.37435-1-eric.devol...@oracle.com/
 - working patchset demonstrating kernel handling of hotplug
   updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
 https://lkml.org/lkml/2020/12/14/532
 https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d90...@oracle.com/
 NOTE: s/vmcoreinfo/elfcorehdr/g
 - proposed concept of allowing kernel to handle hotplug update
   of elfcorehdr
---


Eric DeVolder (8):
  crash: move a few code bits to setup support of crash hotplug
  crash: add generic infrastructure for crash hotplug support
  kexec: exclude elfcorehdr from the segment digest
  crash: memory and CPU hotplug sysfs attributes
  x86/crash: add x86 crash hotplug support
  crash: hotplug support for kexec_load()
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  x86/crash: optimize CPU changes

 .../ABI/testing/sysfs-devices-memory  |   8 +
 .../ABI/testing/sysfs-devices-system-cpu  |   8 +
 .../admin-guide/mm/memory-hotplug.rst |   8 +
 Documentation/core-api/cpu_hotplug.rst|  18 +
 arch/x86/Kconfig  |   3 +
 arch/x86/include/asm/kexec.h  |  18 +
 arch/x86/kernel/crash.c   | 140 ++-
 drivers/base/cpu.c|  13 +
 drivers/base/memory.c |  13 +
 include/linux/crash_core.h|   9 +
 include/linux/kexec.h |  63 +++-
 include/uapi/linux/kexec.h|   1 +
 kernel/Kconfig

[PATCH v27 8/8] x86/crash: optimize CPU changes

2023-08-11 Thread Eric DeVolder
crash_prepare_elf64_headers() writes into the elfcorehdr an ELF
PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs
(ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr.

The kimage->file_mode term covers kdump images loaded via the
kexec_file_load() syscall. Since crash_prepare_elf64_headers()
wrote the initial elfcorehdr, no update to the elfcorehdr is
needed for CPU changes.

The kimage->elfcorehdr_updated term covers kdump images loaded via
the kexec_load() syscall. At least one memory or CPU change must occur
to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr.
Afterwards, no update to the elfcorehdr is needed for CPU changes.

This code is intentionally *NOT* hoisted into
crash_handle_hotplug_event() as it would prevent the arch-specific
handler from running for CPU changes. This would break PPC, for
example, which needs to update other information besides the
elfcorehdr, on CPU changes.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/kernel/crash.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index caf22bcb61af..18d2a18d1073 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -467,6 +467,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
unsigned long mem, memsz;
unsigned long elfsz = 0;
 
+   /*
+* As crash_prepare_elf64_headers() has already described all
+* possible CPUs, there is no need to update the elfcorehdr
+* for additional CPU changes.
+*/
+   if ((image->file_mode || image->elfcorehdr_updated) &&
+   ((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+   (image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+   return;
+
/*
 * Create the new elfcorehdr reflecting the changes to CPU and/or
 * memory resources.
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v27 7/8] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()

2023-08-11 Thread Eric DeVolder
The function crash_prepare_elf64_headers() generates the elfcorehdr
which describes the CPUs and memory in the system for the crash kernel.
In particular, it writes out ELF PT_NOTEs for memory regions and the
CPUs in the system.

With respect to the CPUs, the current implementation utilizes
for_each_present_cpu() which means that as CPUs are added and removed,
the elfcorehdr must again be updated to reflect the new set of CPUs.

The reasoning behind the move to use for_each_possible_cpu(), is:

- At kernel boot time, all percpu crash_notes are allocated for all
  possible CPUs; that is, crash_notes are not allocated dynamically
  when CPUs are plugged/unplugged. Thus the crash_notes for each
  possible CPU are always available.

- The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU.
  Changing to for_each_possible_cpu() is valid as the crash_notes
  pointed to by each CPU PT_NOTE are present and always valid.

Furthermore, examining a common crash processing path of:

 kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
   elfcorehdr  /proc/vmcore vmcore

reveals how the ELF CPU PT_NOTEs are utilized:

- Upon panic, each CPU is sent an IPI and shuts itself down, recording
 its state in its crash_notes. When all CPUs are shutdown, the
 crash kernel is launched with a pointer to the elfcorehdr.

- The crash kernel via linux/fs/proc/vmcore.c does not examine or
 use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.

- The makedumpfile utility uses /proc/vmcore and reads the CPU
 PT_NOTEs to craft a nr_cpus variable, which is reported in a
 header but otherwise generally unused. Makedumpfile creates the
 vmcore.

- The 'crash' dump analyzer does not appear to reference the CPU
 PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
 symbols and directly examines those structure contents from vmcore
 memory. From that information it is able to determine which CPUs
 are present and online, and locate the corresponding crash_notes.
 Said differently, it appears that 'crash' analyzer does not rely
 on the ELF PT_NOTEs for CPUs; rather it obtains the information
 directly via kernel symbols and the memory within the vmcore.

(There maybe other vmcore generating and analysis tools that do use
these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
common solution.)

This results in the benefit of having all CPUs described in the
elfcorehdr, and therefore reducing the need to re-generate the
elfcorehdr on CPU changes, at the small expense of an additional
56 bytes per PT_NOTE for not-present-but-possible CPUs.

On systems where kexec_file_load() syscall is utilized, all the above
is valid. On systems where kexec_load() syscall is utilized, there
may be the need for the elfcorehdr to be regenerated once. The reason
being that some archs only populate the 'present' CPUs from the
/sys/devices/system/cpus entries, which the userspace 'kexec' utility
uses to generate the userspace-supplied elfcorehdr. In this situation,
one memory or CPU change will rewrite the elfcorehdr via the
crash_prepare_elf64_headers() function and now all possible CPUs will
be described, just as with kexec_file_load() syscall.

Suggested-by: Sourabh Jain 
Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/crash_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index fa918176d46d..7378b501fada 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
ehdr->e_ehsize = sizeof(Elf64_Ehdr);
ehdr->e_phentsize = sizeof(Elf64_Phdr);
 
-   /* Prepare one phdr of type PT_NOTE for each present CPU */
-   for_each_present_cpu(cpu) {
+   /* Prepare one phdr of type PT_NOTE for each possible CPU */
+   for_each_possible_cpu(cpu) {
phdr->p_type = PT_NOTE;
notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
phdr->p_offset = phdr->p_paddr = notes_addr;
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v27 5/8] x86/crash: add x86 crash hotplug support

2023-08-11 Thread Eric DeVolder
When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

A new elfcorehdr is generated from the available CPUs and memory
and replaces the existing elfcorehdr. The segment containing the
elfcorehdr is identified at run-time in
crash_core:crash_handle_hotplug_event().

No modifications to purgatory (see 'kexec: exclude elfcorehdr
from the segment digest') or boot_params (as the elfcorehdr=
capture kernel command line parameter pointer remains unchanged
and correct) are needed, just elfcorehdr.

For kexec_file_load(), the elfcorehdr segment size is based on
NR_CPUS and CRASH_MAX_MEMORY_RANGES in order to accommodate a
growing number of CPU and memory resources.

For kexec_load(), the userspace kexec utility needs to size the
elfcorehdr segment in the same/similar manner.

To accommodate kexec_load() syscall in the absence of
kexec_file_load() syscall support, prepare_elf_headers() and
dependents are moved outside of CONFIG_KEXEC_FILE.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/Kconfig |   3 +
 arch/x86/include/asm/kexec.h |  15 +
 arch/x86/kernel/crash.c  | 103 ---
 3 files changed, 114 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7082fc10b346..ffc95c3d6abd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2069,6 +2069,9 @@ config ARCH_SUPPORTS_KEXEC_JUMP
 config ARCH_SUPPORTS_CRASH_DUMP
def_bool X86_64 || (X86_32 && HIGHMEM)
 
+config ARCH_SUPPORTS_CRASH_HOTPLUG
+   def_bool y
+
 config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || 
CRASH_DUMP)
default "0x100"
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5b77bbc28f96..9143100ea3ea 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -209,6 +209,21 @@ typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..c70a111c44fa 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -158,8 +158,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
crash_save_cpu(regs, safe_smp_processor_id());
 }
 
-#ifdef CONFIG_KEXEC_FILE
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -231,7 +229,7 @@ static int prepare_elf64_ram_headers_callback(struct 
resource *res, void *arg)
 
 /* Prepare elf headers. Return addr and size */
 static int prepare_elf_headers(struct kimage *image, void **addr,
-   unsigned long *sz)
+   unsigned long *sz, unsigned long 
*nr_mem_ranges)
 {
struct crash_mem *cmem;
int ret;
@@ -249,6 +247,9 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
if (ret)
goto out;
 
+   /* Return the computed number of memory ranges, for hotplug usage */
+   *nr_mem_ranges = cmem->nr_ranges;
+
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), 
addr, sz);
 
@@ -257,6 +258,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
return ret;
 }
 
+#ifdef CONFIG_KEXEC_FILE
 static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
 {
unsigned int nr_e820_entries;
@@ -371,18 +373,42 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
 int crash_load_segments(struct kimage *image)
 {
int ret;
+   unsigned long pnum = 0;
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
  .buf_max = ULONG_MAX, .top_down = false };
 
/* Prepare elf headers and add a segment */
-   ret = prepare_elf_headers(image, , );
+   ret = prepare_elf_headers(image, , , );
if (ret)
return ret;
 
-   image->elf_headers = kbuf.buffer;
-   image->elf_headers_sz = kbuf.bufsz;
+   image->elf_headers  = kbuf.buffer;
+   image->elf_he

[PATCH v27 3/8] kexec: exclude elfcorehdr from the segment digest

2023-08-11 Thread Eric DeVolder
When a crash kernel is loaded via the kexec_file_load() syscall, the
kernel places the various segments (ie crash kernel, crash initrd,
boot_params, elfcorehdr, purgatory, etc) in memory. For those
architectures that utilize purgatory, a hash digest of the segments
is calculated for integrity checking. The digest is embedded into
the purgatory image prior to placing in memory.

Updates to the elfcorehdr in response to CPU and memory changes
would cause the purgatory integrity checking to fail (at crash time,
and no vmcore created). Therefore, the elfcorehdr segment is
explicitly excluded from the purgatory digest, enabling updates to
the elfcorehdr while also avoiding the need to recompute the hash
digest and reload purgatory.

Suggested-by: Baoquan He 
Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/kexec_file.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 453b7a513540..e2ec9d7b9a1f 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -726,6 +726,12 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
for (j = i = 0; i < image->nr_segments; i++) {
struct kexec_segment *ksegment;
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   /* Exclude elfcorehdr segment to allow future changes via 
hotplug */
+   if (j == image->elfcorehdr_index)
+   continue;
+#endif
+
ksegment = >segment[i];
/*
 * Skip purgatory as it will be modified once we put digest
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v27 4/8] crash: memory and CPU hotplug sysfs attributes

2023-08-11 Thread Eric DeVolder
Introduce the crash_hotplug attribute for memory and CPUs for
use by userspace.  These attributes directly facilitate the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/memory/memory81
  looking at device '/devices/system/memory/memory81':
KERNEL=="memory81"
SUBSYSTEM=="memory"
DRIVER==""
ATTR{online}=="1"
ATTR{phys_device}=="0"
ATTR{phys_index}=="0051"
ATTR{removable}=="1"
ATTR{state}=="online"
ATTR{valid_zones}=="Movable"

  looking at parent device '/devices/system/memory':
KERNELS=="memory"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{auto_online_blocks}=="offline"
ATTRS{block_size_bytes}=="800"
ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
  looking at device '/devices/system/cpu/cpu0':
KERNEL=="cpu0"
SUBSYSTEM=="cpu"
DRIVER=="processor"
ATTR{crash_notes}=="277c38600"
ATTR{crash_notes_size}=="368"
ATTR{online}=="1"

  looking at parent device '/devices/system/cpu':
KERNELS=="cpu"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{crash_hotplug}=="1"
ATTRS{isolated}==""
ATTRS{kernel_max}=="8191"
ATTRS{nohz_full}=="  (null)"
ATTRS{offline}=="4-7"
ATTRS{online}=="0-3"
ATTRS{possible}=="0-7"
ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

 # The kernel updates the crash elfcorehdr for CPU and memory changes
 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
 SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules
test if crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule skips
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 Documentation/ABI/testing/sysfs-devices-memory |  8 
 .../ABI/testing/sysfs-devices-system-cpu   |  8 
 .../admin-guide/mm/memory-hotplug.rst  |  8 
 Documentation/core-api/cpu_hotplug.rst | 18 ++
 drivers/base/cpu.c | 13 +
 drivers/base/memory.c  | 13 +
 include/linux/kexec.h  |  8 
 7 files changed, 76 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-memory 
b/Documentation/ABI/testing/sysfs-devices-memory
index d8b0f80b9e33..a95e0f17c35a 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -110,3 +110,11 @@ Description:
link is created for memory section 9 on node0.
 
/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
+
+What:  /sys/devices/system/memory/crash_hotplug
+Date:  Aug 2023
+Contact:   Linux kernel mailing list 
+Description:
+   (RO) indicates whether or not the kernel directly supports
+   modifying the crash elfcorehdr for memory hot un/plug and/or
+   on/offline changes.
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 77942eedf4f6..b52564de2b18 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -687,3 +687,11 @@ Description:
(RO) the list of CPUs that are isolated

[PATCH v27 1/8] crash: move a few code bits to setup support of crash hotplug

2023-08-11 Thread Eric DeVolder
The crash hotplug support leans on the work for the kexec_file_load()
syscall. To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

No functionality change intended.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/kexec.h |  30 +++
 kernel/crash_core.c   | 182 ++
 kernel/kexec_file.c   | 181 -
 3 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 22b5cd24f581..811a90e09698 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -105,6 +105,21 @@ struct compat_kexec_segment {
 };
 #endif
 
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN   4096
+
+struct crash_mem {
+   unsigned int max_nr_ranges;
+   unsigned int nr_ranges;
+   struct range ranges[];
+};
+
+extern int crash_exclude_mem_range(struct crash_mem *mem,
+  unsigned long long mstart,
+  unsigned long long mend);
+extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
+  void **addr, unsigned long *sz);
+
 #ifdef CONFIG_KEXEC_FILE
 struct purgatory_info {
/*
@@ -230,21 +245,6 @@ static inline int arch_kexec_locate_mem_hole(struct 
kexec_buf *kbuf)
 }
 #endif
 
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN   4096
-
-struct crash_mem {
-   unsigned int max_nr_ranges;
-   unsigned int nr_ranges;
-   struct range ranges[];
-};
-
-extern int crash_exclude_mem_range(struct crash_mem *mem,
-  unsigned long long mstart,
-  unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
-  void **addr, unsigned long *sz);
-
 #ifndef arch_kexec_apply_relocations_add
 /*
  * arch_kexec_apply_relocations_add - apply relocations of type RELA
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..b7c30b748a16 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -314,6 +315,187 @@ static int __init parse_crashkernel_dummy(char *arg)
 }
 early_param("crashkernel", parse_crashkernel_dummy);
 
+int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+ void **addr, unsigned long *sz)
+{
+   Elf64_Ehdr *ehdr;
+   Elf64_Phdr *phdr;
+   unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+   unsigned char *buf;
+   unsigned int cpu, i;
+   unsigned long long notes_addr;
+   unsigned long mstart, mend;
+
+   /* extra phdr for vmcoreinfo ELF note */
+   nr_phdr = nr_cpus + 1;
+   nr_phdr += mem->nr_ranges;
+
+   /*
+* kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+* area (for example, 8000 - a000 on x86_64).
+* I think this is required by tools like gdb. So same physical
+* memory will be mapped in two ELF headers. One will contain kernel
+* text virtual addresses and other will have __va(physical) addresses.
+*/
+
+   nr_phdr++;
+   elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+   elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+   buf = vzalloc(elf_sz);
+   if (!buf)
+   return -ENOMEM;
+
+   ehdr = (Elf64_Ehdr *)buf;
+   phdr = (Elf64_Phdr *)(ehdr + 1);
+   memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+   ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+   ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
+   ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+   ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+   memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
+   ehdr->e_type = ET_CORE;
+   ehdr->e_machine = ELF_ARCH;
+   ehdr->e_version = EV_CURRENT;
+   ehdr->e_phoff = sizeof(Elf64_Ehdr);
+   ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+   ehdr->e_phentsize = sizeof(Elf64_Phdr);
+
+   /* Prepare one phdr of type PT_NOTE for each present CPU */
+   for_each_present_cpu(cpu) {
+   phdr->p_type = PT_NOTE;
+   notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
+   phdr->p_offset = phdr->p_paddr = notes_addr;
+   phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
+   (ehdr->e_phnum)++;
+   phdr++;
+   }
+
+   /* Prepare one PT_NOTE header for vmcoreinfo */
+   phdr->p_type = PT_NOTE;
+   

[PATCH v27 6/8] crash: hotplug support for kexec_load()

2023-08-11 Thread Eric DeVolder
The hotplug support for kexec_load() requires changes to the
userspace kexec-tools and a little extra help from the kernel.

Given a kdump capture kernel loaded via kexec_load(), and a
subsequent hotplug event, the crash hotplug handler finds the
elfcorehdr and rewrites it to reflect the hotplug change.
That is the desired outcome, however, at kernel panic time,
the purgatory integrity check fails (because the elfcorehdr
changed), and the capture kernel does not boot and no vmcore
is generated.

Therefore, the userspace kexec-tools/kexec must indicate to the
kernel that the elfcorehdr can be modified (because the kexec
excluded the elfcorehdr from the digest, and sized the elfcorehdr
memory buffer appropriately).

To facilitate hotplug support with kexec_load():
 - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is
   safe for the kernel to modify the kexec_load()'d elfcorehdr
 - the /sys/kernel/crash_elfcorehdr_size node communicates the
   preferred size of the elfcorehdr memory buffer
 - The sysfs crash_hotplug nodes (ie.
   /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically
   take into account kexec_file_load() vs kexec_load() and
   KEXEC_UPDATE_ELFCOREHDR.
   This is critical so that the udev rule processing of crash_hotplug
   is all that is needed to determine if the userspace unload-then-load
   of the kdump image is to be skipped, or not. The proposed udev
   rule change looks like:
   # The kernel updates the crash elfcorehdr for CPU and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

The table below indicates the behavior of kexec_load()'d kdump image
updates (with the new udev crash_hotplug rule in place):

 Kernel |Kexec
 ---+-+
 Old|Old  |New
|  a  | a
 ---+-+
 New|  a  | b
 ---+-+

where kexec 'old' and 'new' delineate kexec-tools has the needed
modifications for the crash hotplug feature, and kernel 'old' and
'new' delineate the kernel supports this crash hotplug feature.

Behavior 'a' indicates the unload-then-reload of the entire kdump
image. For the kexec 'old' column, the unload-then-reload occurs
due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel
(with 'new' kexec) does not present the crash_hotplug sysfs node,
which leads to the unload-then-reload of the kdump image.

Behavior 'b' indicates the desired optimized behavior of the kernel
directly modifying the elfcorehdr and avoiding the unload-then-reload
of the kdump image.

If the udev rule is not updated with crash_hotplug node check, then
no matter any combination of kernel or kexec is new or old, the
kdump image continues to be unload-then-reload on hotplug changes.

To fully support crash hotplug feature, there needs to be a rollout
of kernel, kexec-tools and udev rule changes. However, the order of
the rollout of these pieces does not matter; kexec_load()'d kdump
images still function for hotplug as-is.

Suggested-by: Hari Bathini 
Signed-off-by: Eric DeVolder 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/include/asm/kexec.h | 11 +++
 arch/x86/kernel/crash.c  | 27 +++
 include/linux/kexec.h| 14 --
 include/uapi/linux/kexec.h   |  1 +
 kernel/Kconfig.kexec |  4 
 kernel/crash_core.c  | 31 +++
 kernel/kexec.c   |  5 +
 kernel/ksysfs.c  | 15 +++
 8 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage 
*image);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(void);
+#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static inline int crash_hotplug_memory_support(void) { return 1; }
-#define crash_hotplug_memory_support crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(void);
+#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
 #endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c70a111c44fa..caf22bcb61af 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -427,6 +427,33 @@ int crash_load_segments(struct kimage *image)
 #undef pr_fmt
 #define pr_fmt(fmt) "cras

[PATCH v26 7/8] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()

2023-08-04 Thread Eric DeVolder
The function crash_prepare_elf64_headers() generates the elfcorehdr
which describes the CPUs and memory in the system for the crash kernel.
In particular, it writes out ELF PT_NOTEs for memory regions and the
CPUs in the system.

With respect to the CPUs, the current implementation utilizes
for_each_present_cpu() which means that as CPUs are added and removed,
the elfcorehdr must again be updated to reflect the new set of CPUs.

The reasoning behind the move to use for_each_possible_cpu(), is:

- At kernel boot time, all percpu crash_notes are allocated for all
  possible CPUs; that is, crash_notes are not allocated dynamically
  when CPUs are plugged/unplugged. Thus the crash_notes for each
  possible CPU are always available.

- The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU.
  Changing to for_each_possible_cpu() is valid as the crash_notes
  pointed to by each CPU PT_NOTE are present and always valid.

Furthermore, examining a common crash processing path of:

 kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
   elfcorehdr  /proc/vmcore vmcore

reveals how the ELF CPU PT_NOTEs are utilized:

- Upon panic, each CPU is sent an IPI and shuts itself down, recording
 its state in its crash_notes. When all CPUs are shutdown, the
 crash kernel is launched with a pointer to the elfcorehdr.

- The crash kernel via linux/fs/proc/vmcore.c does not examine or
 use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.

- The makedumpfile utility uses /proc/vmcore and reads the CPU
 PT_NOTEs to craft a nr_cpus variable, which is reported in a
 header but otherwise generally unused. Makedumpfile creates the
 vmcore.

- The 'crash' dump analyzer does not appear to reference the CPU
 PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
 symbols and directly examines those structure contents from vmcore
 memory. From that information it is able to determine which CPUs
 are present and online, and locate the corresponding crash_notes.
 Said differently, it appears that 'crash' analyzer does not rely
 on the ELF PT_NOTEs for CPUs; rather it obtains the information
 directly via kernel symbols and the memory within the vmcore.

(There maybe other vmcore generating and analysis tools that do use
these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
common solution.)

This results in the benefit of having all CPUs described in the
elfcorehdr, and therefore reducing the need to re-generate the
elfcorehdr on CPU changes, at the small expense of an additional
56 bytes per PT_NOTE for not-present-but-possible CPUs.

On systems where kexec_file_load() syscall is utilized, all the above
is valid. On systems where kexec_load() syscall is utilized, there
may be the need for the elfcorehdr to be regenerated once. The reason
being that some archs only populate the 'present' CPUs from the
/sys/devices/system/cpus entries, which the userspace 'kexec' utility
uses to generate the userspace-supplied elfcorehdr. In this situation,
one memory or CPU change will rewrite the elfcorehdr via the
crash_prepare_elf64_headers() function and now all possible CPUs will
be described, just as with kexec_file_load() syscall.

Suggested-by: Sourabh Jain 
Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/crash_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index fa918176d46d..7378b501fada 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
ehdr->e_ehsize = sizeof(Elf64_Ehdr);
ehdr->e_phentsize = sizeof(Elf64_Phdr);
 
-   /* Prepare one phdr of type PT_NOTE for each present CPU */
-   for_each_present_cpu(cpu) {
+   /* Prepare one phdr of type PT_NOTE for each possible CPU */
+   for_each_possible_cpu(cpu) {
phdr->p_type = PT_NOTE;
notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
phdr->p_offset = phdr->p_paddr = notes_addr;
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v26 8/8] x86/crash: optimize CPU changes

2023-08-04 Thread Eric DeVolder
crash_prepare_elf64_headers() writes into the elfcorehdr an ELF
PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs
(ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr.

The kimage->file_mode term covers kdump images loaded via the
kexec_file_load() syscall. Since crash_prepare_elf64_headers()
wrote the initial elfcorehdr, no update to the elfcorehdr is
needed for CPU changes.

The kimage->elfcorehdr_updated term covers kdump images loaded via
the kexec_load() syscall. At least one memory or CPU change must occur
to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr.
Afterwards, no update to the elfcorehdr is needed for CPU changes.

This code is intentionally *NOT* hoisted into
crash_handle_hotplug_event() as it would prevent the arch-specific
handler from running for CPU changes. This would break PPC, for
example, which needs to update other information besides the
elfcorehdr, on CPU changes.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/kernel/crash.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index caf22bcb61af..18d2a18d1073 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -467,6 +467,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
unsigned long mem, memsz;
unsigned long elfsz = 0;
 
+   /*
+* As crash_prepare_elf64_headers() has already described all
+* possible CPUs, there is no need to update the elfcorehdr
+* for additional CPU changes.
+*/
+   if ((image->file_mode || image->elfcorehdr_updated) &&
+   ((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+   (image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+   return;
+
/*
 * Create the new elfcorehdr reflecting the changes to CPU and/or
 * memory resources.
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v26 3/8] kexec: exclude elfcorehdr from the segment digest

2023-08-04 Thread Eric DeVolder
When a crash kernel is loaded via the kexec_file_load() syscall, the
kernel places the various segments (ie crash kernel, crash initrd,
boot_params, elfcorehdr, purgatory, etc) in memory. For those
architectures that utilize purgatory, a hash digest of the segments
is calculated for integrity checking. The digest is embedded into
the purgatory image prior to placing in memory.

Updates to the elfcorehdr in response to CPU and memory changes
would cause the purgatory integrity checking to fail (at crash time,
and no vmcore created). Therefore, the elfcorehdr segment is
explicitly excluded from the purgatory digest, enabling updates to
the elfcorehdr while also avoiding the need to recompute the hash
digest and reload purgatory.

Suggested-by: Baoquan He 
Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/kexec_file.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 453b7a513540..e2ec9d7b9a1f 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -726,6 +726,12 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
for (j = i = 0; i < image->nr_segments; i++) {
struct kexec_segment *ksegment;
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   /* Exclude elfcorehdr segment to allow future changes via 
hotplug */
+   if (j == image->elfcorehdr_index)
+   continue;
+#endif
+
ksegment = >segment[i];
/*
 * Skip purgatory as it will be modified once we put digest
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v26 1/8] crash: move a few code bits to setup support of crash hotplug

2023-08-04 Thread Eric DeVolder
The crash hotplug support leans on the work for the kexec_file_load()
syscall. To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

No functionality change intended.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/kexec.h |  30 +++
 kernel/crash_core.c   | 182 ++
 kernel/kexec_file.c   | 181 -
 3 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 22b5cd24f581..811a90e09698 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -105,6 +105,21 @@ struct compat_kexec_segment {
 };
 #endif
 
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN   4096
+
+struct crash_mem {
+   unsigned int max_nr_ranges;
+   unsigned int nr_ranges;
+   struct range ranges[];
+};
+
+extern int crash_exclude_mem_range(struct crash_mem *mem,
+  unsigned long long mstart,
+  unsigned long long mend);
+extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
+  void **addr, unsigned long *sz);
+
 #ifdef CONFIG_KEXEC_FILE
 struct purgatory_info {
/*
@@ -230,21 +245,6 @@ static inline int arch_kexec_locate_mem_hole(struct 
kexec_buf *kbuf)
 }
 #endif
 
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN   4096
-
-struct crash_mem {
-   unsigned int max_nr_ranges;
-   unsigned int nr_ranges;
-   struct range ranges[];
-};
-
-extern int crash_exclude_mem_range(struct crash_mem *mem,
-  unsigned long long mstart,
-  unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
-  void **addr, unsigned long *sz);
-
 #ifndef arch_kexec_apply_relocations_add
 /*
  * arch_kexec_apply_relocations_add - apply relocations of type RELA
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..b7c30b748a16 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -314,6 +315,187 @@ static int __init parse_crashkernel_dummy(char *arg)
 }
 early_param("crashkernel", parse_crashkernel_dummy);
 
+int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+ void **addr, unsigned long *sz)
+{
+   Elf64_Ehdr *ehdr;
+   Elf64_Phdr *phdr;
+   unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+   unsigned char *buf;
+   unsigned int cpu, i;
+   unsigned long long notes_addr;
+   unsigned long mstart, mend;
+
+   /* extra phdr for vmcoreinfo ELF note */
+   nr_phdr = nr_cpus + 1;
+   nr_phdr += mem->nr_ranges;
+
+   /*
+* kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+* area (for example, 8000 - a000 on x86_64).
+* I think this is required by tools like gdb. So same physical
+* memory will be mapped in two ELF headers. One will contain kernel
+* text virtual addresses and other will have __va(physical) addresses.
+*/
+
+   nr_phdr++;
+   elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+   elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+   buf = vzalloc(elf_sz);
+   if (!buf)
+   return -ENOMEM;
+
+   ehdr = (Elf64_Ehdr *)buf;
+   phdr = (Elf64_Phdr *)(ehdr + 1);
+   memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+   ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+   ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
+   ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+   ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+   memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
+   ehdr->e_type = ET_CORE;
+   ehdr->e_machine = ELF_ARCH;
+   ehdr->e_version = EV_CURRENT;
+   ehdr->e_phoff = sizeof(Elf64_Ehdr);
+   ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+   ehdr->e_phentsize = sizeof(Elf64_Phdr);
+
+   /* Prepare one phdr of type PT_NOTE for each present CPU */
+   for_each_present_cpu(cpu) {
+   phdr->p_type = PT_NOTE;
+   notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
+   phdr->p_offset = phdr->p_paddr = notes_addr;
+   phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
+   (ehdr->e_phnum)++;
+   phdr++;
+   }
+
+   /* Prepare one PT_NOTE header for vmcoreinfo */
+   phdr->p_type = PT_NOTE;
+   

[PATCH v26 0/8] crash: Kernel handling of CPU and memory hot un/plug

2023-08-04 Thread Eric DeVolder
kexec_load
   and kexec_file_load can benefit.
 - Updated userspace kexec-tools kexec utility to reflect change to
   using CRASH_MAX_MEMORY_RANGES and get_nr_cpus().
 - Updated the new proposed udev rules to reflect using the sysfs
   attributes crash_hotplug.

v8: 5may2022
 https://lkml.org/lkml/2022/5/5/1133
 https://lore.kernel.org/lkml/20220505184603.1548-1-eric.devol...@oracle.com/
 - Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
   of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
   is not needed. Also use of IS_ENABLED() rather than #ifdef's.
   Renamed crash_hotplug_handler() to handle_hotplug_event().
   And other corrections.
 - Per Baoquan, minimized the parameters to the arch_crash_
   handle_hotplug_event() to hp_action and cpu.
 - Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
 - Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
   to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
   by David Hildebrand. Folded this patch into the x86
   kexec_file_load support patch.

v7: 13apr2022
 https://lkml.org/lkml/2022/4/13/850
 https://lore.kernel.org/lkml/20220413164237.20845-1-eric.devol...@oracle.com/
 - Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
 https://lkml.org/lkml/2022/4/1/1203
 https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devol...@oracle.com/
 - Reword commit messages and some comment cleanup per Baoquan.
 - Changed elf_index to elfcorehdr_index for clarity.
 - Minor code changes per Baoquan.

v5: 3mar2022
 https://lkml.org/lkml/2022/3/3/674
 https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/
 - Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
   David Hildenbrand.
 - Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
 https://lkml.org/lkml/2022/2/9/1406
 https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/
 - Refactored patches per Baoquan suggestsions.
 - A few corrections, per Baoquan.

v3: 10jan2022
 https://lkml.org/lkml/2022/1/10/1212
 https://lore.kernel.org/lkml/20220110195727.1682-1-eric.devol...@oracle.com/
 - Rebasing per Baoquan He request.
 - Changed memory notifier per David Hildenbrand.
 - Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
 https://lkml.org/lkml/2021/12/7/1088
 https://lore.kernel.org/lkml/20211207195204.1582-1-eric.devol...@oracle.com/
 - Acting upon Baoquan He suggestion of removing elfcorehdr from
   the purgatory list of segments, removed purgatory code from
   patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
 https://lkml.org/lkml/2021/11/18/845
 https://lore.kernel.org/lkml/2028174948.37435-1-eric.devol...@oracle.com/
 - working patchset demonstrating kernel handling of hotplug
   updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
 https://lkml.org/lkml/2020/12/14/532
 https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d90...@oracle.com/
 NOTE: s/vmcoreinfo/elfcorehdr/g
 - proposed concept of allowing kernel to handle hotplug update
   of elfcorehdr
---

Eric DeVolder (8):
  crash: move a few code bits to setup support of crash hotplug
  crash: add generic infrastructure for crash hotplug support
  kexec: exclude elfcorehdr from the segment digest
  crash: memory and CPU hotplug sysfs attributes
  x86/crash: add x86 crash hotplug support
  crash: hotplug support for kexec_load()
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  x86/crash: optimize CPU changes

 .../ABI/testing/sysfs-devices-memory  |   8 +
 .../ABI/testing/sysfs-devices-system-cpu  |   8 +
 .../admin-guide/mm/memory-hotplug.rst |   8 +
 Documentation/core-api/cpu_hotplug.rst|  18 +
 arch/x86/Kconfig  |   3 +
 arch/x86/include/asm/kexec.h  |  18 +
 arch/x86/kernel/crash.c   | 140 ++-
 drivers/base/cpu.c|  13 +
 drivers/base/memory.c |  13 +
 include/linux/crash_core.h|   9 +
 include/linux/kexec.h |  63 +++-
 include/uapi/linux/kexec.h|   1 +
 kernel/Kconfig.kexec  |  35 ++
 kernel/crash_core.c   | 355 ++
 kernel/kexec.c|   5 +
 kernel/kexec_core.c   |   6 +
 kernel/kexec_file.c   | 187 +
 kernel/ksysfs.c   |  15 +
 18 files changed, 700 insertions(+), 205 deletions(-)

-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v26 6/8] crash: hotplug support for kexec_load()

2023-08-04 Thread Eric DeVolder
The hotplug support for kexec_load() requires changes to the
userspace kexec-tools and a little extra help from the kernel.

Given a kdump capture kernel loaded via kexec_load(), and a
subsequent hotplug event, the crash hotplug handler finds the
elfcorehdr and rewrites it to reflect the hotplug change.
That is the desired outcome, however, at kernel panic time,
the purgatory integrity check fails (because the elfcorehdr
changed), and the capture kernel does not boot and no vmcore
is generated.

Therefore, the userspace kexec-tools/kexec must indicate to the
kernel that the elfcorehdr can be modified (because the kexec
excluded the elfcorehdr from the digest, and sized the elfcorehdr
memory buffer appropriately).

To facilitate hotplug support with kexec_load():
 - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is
   safe for the kernel to modify the kexec_load()'d elfcorehdr
 - the /sys/kernel/crash_elfcorehdr_size node communicates the
   preferred size of the elfcorehdr memory buffer
 - The sysfs crash_hotplug nodes (ie.
   /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically
   take into account kexec_file_load() vs kexec_load() and
   KEXEC_UPDATE_ELFCOREHDR.
   This is critical so that the udev rule processing of crash_hotplug
   is all that is needed to determine if the userspace unload-then-load
   of the kdump image is to be skipped, or not. The proposed udev
   rule change looks like:
   # The kernel updates the crash elfcorehdr for CPU and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

The table below indicates the behavior of kexec_load()'d kdump image
updates (with the new udev crash_hotplug rule in place):

 Kernel |Kexec
 ---+-+
 Old|Old  |New
|  a  | a
 ---+-+
 New|  a  | b
 ---+-+

where kexec 'old' and 'new' delineate kexec-tools has the needed
modifications for the crash hotplug feature, and kernel 'old' and
'new' delineate the kernel supports this crash hotplug feature.

Behavior 'a' indicates the unload-then-reload of the entire kdump
image. For the kexec 'old' column, the unload-then-reload occurs
due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel
(with 'new' kexec) does not present the crash_hotplug sysfs node,
which leads to the unload-then-reload of the kdump image.

Behavior 'b' indicates the desired optimized behavior of the kernel
directly modifying the elfcorehdr and avoiding the unload-then-reload
of the kdump image.

If the udev rule is not updated with crash_hotplug node check, then
no matter any combination of kernel or kexec is new or old, the
kdump image continues to be unload-then-reload on hotplug changes.

To fully support crash hotplug feature, there needs to be a rollout
of kernel, kexec-tools and udev rule changes. However, the order of
the rollout of these pieces does not matter; kexec_load()'d kdump
images still function for hotplug as-is.

Suggested-by: Hari Bathini 
Signed-off-by: Eric DeVolder 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/include/asm/kexec.h | 11 +++
 arch/x86/kernel/crash.c  | 27 +++
 include/linux/kexec.h| 14 --
 include/uapi/linux/kexec.h   |  1 +
 kernel/Kconfig.kexec |  4 
 kernel/crash_core.c  | 31 +++
 kernel/kexec.c   |  5 +
 kernel/ksysfs.c  | 15 +++
 8 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage 
*image);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(void);
+#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static inline int crash_hotplug_memory_support(void) { return 1; }
-#define crash_hotplug_memory_support crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(void);
+#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
 #endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c70a111c44fa..caf22bcb61af 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -427,6 +427,33 @@ int crash_load_segments(struct kimage *image)
 #undef pr_fmt
 #define pr_fmt(fmt) "cras

[PATCH v26 2/8] crash: add generic infrastructure for crash hotplug support

2023-08-04 Thread Eric DeVolder
To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining). The crash elfcorehdr describes the CPUs and memory to
be written into the vmcore.

To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE
group, just prior to the STARTING group, which is very close to the
CPU starting up in a plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate. Note that for a CPU being unplugged
or offlined, the CPU will still be present in the list of CPUs
generated by crash_prepare_elf64_headers(). However, there is no
need to explicitly omit the CPU, see justification in
'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'.

To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/crash_core.h |   9 +++
 include/linux/kexec.h  |  11 +++
 kernel/Kconfig.kexec   |  31 
 kernel/crash_core.c| 142 +
 kernel/kexec_core.c|   6 ++
 5 files changed, 199 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..e14345cc7a22 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long 
system_ram,
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
 
+#define KEXEC_CRASH_HP_NONE0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_CPU  2
+#define KEXEC_CRASH_HP_ADD_MEMORY  3
+#define KEXEC_CRASH_HP_REMOVE_MEMORY   4
+#define KEXEC_CRASH_HP_INVALID_CPU -1U
+
+struct kimage;
+
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 811a90e09698..b9903dd48e24 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Verify architecture specific macros are defined */
@@ -360,6 +361,12 @@ struct kimage {
struct purgatory_info purgatory_info;
 #endif
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   int hp_action;
+   int elfcorehdr_index;
+   bool elfcorehdr_updated;
+#endif
+
 #ifdef CONFIG_IMA_KEXEC
/* Virtual address of IMA measurement buffer for kexec syscall */
void *ima_buffer;
@@ -490,6 +497,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, 
unsigned int pages, g
 static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) 
{ }
 #endif
 
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index ff72e45cfaef..d0a9a5392035 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -113,4 +113,35 @@ config CRASH_DUMP
  For s390, this option also enables zfcpdump.
  See also 
 
+config CRASH_HOTPLUG
+   bool "Update the crash elfcorehdr on system configuration changes"
+   default y
+   depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+   depends on ARCH_SUPPORTS_CRASH_HOTPLUG
+   help
+ Enable direct update to the crash elfcorehdr (which contains
+ the list of CPUs and memory regions to be dumped upon a crash)
+ in response to hot plug/unplug or online/offline of CPUs or
+ memory. This is a much more advanced approach than userspace
+ attempting that.
+
+ If unsure, say Y.
+
+config CRASH_MAX_MEMORY_RANGES
+   int "Specify the maximum number of memory regions for the elfcorehdr"
+   default 8192
+   depends on CRASH_HOTPLUG
+   help
+

[PATCH v26 5/8] x86/crash: add x86 crash hotplug support

2023-08-04 Thread Eric DeVolder
When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

A new elfcorehdr is generated from the available CPUs and memory
and replaces the existing elfcorehdr. The segment containing the
elfcorehdr is identified at run-time in
crash_core:crash_handle_hotplug_event().

No modifications to purgatory (see 'kexec: exclude elfcorehdr
from the segment digest') or boot_params (as the elfcorehdr=
capture kernel command line parameter pointer remains unchanged
and correct) are needed, just elfcorehdr.

For kexec_file_load(), the elfcorehdr segment size is based on
NR_CPUS and CRASH_MAX_MEMORY_RANGES in order to accommodate a
growing number of CPU and memory resources.

For kexec_load(), the userspace kexec utility needs to size the
elfcorehdr segment in the same/similar manner.

To accommodate kexec_load() syscall in the absence of
kexec_file_load() syscall support, prepare_elf_headers() and
dependents are moved outside of CONFIG_KEXEC_FILE.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/Kconfig |   3 +
 arch/x86/include/asm/kexec.h |  15 +
 arch/x86/kernel/crash.c  | 103 ---
 3 files changed, 114 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fedc6743..d9fc80b9ef84 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2069,6 +2069,9 @@ config ARCH_SUPPORTS_KEXEC_JUMP
 config ARCH_SUPPORTS_CRASH_DUMP
def_bool X86_64 || (X86_32 && HIGHMEM)
 
+config ARCH_SUPPORTS_CRASH_HOTPLUG
+   def_bool y
+
 config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || 
CRASH_DUMP)
default "0x100"
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5b77bbc28f96..9143100ea3ea 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -209,6 +209,21 @@ typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..c70a111c44fa 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -158,8 +158,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
crash_save_cpu(regs, safe_smp_processor_id());
 }
 
-#ifdef CONFIG_KEXEC_FILE
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -231,7 +229,7 @@ static int prepare_elf64_ram_headers_callback(struct 
resource *res, void *arg)
 
 /* Prepare elf headers. Return addr and size */
 static int prepare_elf_headers(struct kimage *image, void **addr,
-   unsigned long *sz)
+   unsigned long *sz, unsigned long 
*nr_mem_ranges)
 {
struct crash_mem *cmem;
int ret;
@@ -249,6 +247,9 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
if (ret)
goto out;
 
+   /* Return the computed number of memory ranges, for hotplug usage */
+   *nr_mem_ranges = cmem->nr_ranges;
+
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), 
addr, sz);
 
@@ -257,6 +258,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
return ret;
 }
 
+#ifdef CONFIG_KEXEC_FILE
 static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
 {
unsigned int nr_e820_entries;
@@ -371,18 +373,42 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
 int crash_load_segments(struct kimage *image)
 {
int ret;
+   unsigned long pnum = 0;
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
  .buf_max = ULONG_MAX, .top_down = false };
 
/* Prepare elf headers and add a segment */
-   ret = prepare_elf_headers(image, , );
+   ret = prepare_elf_headers(image, , , );
if (ret)
return ret;
 
-   image->elf_headers = kbuf.buffer;
-   image->elf_headers_sz = kbuf.bufsz;
+   image->elf_headers  = kbuf.buffer;
+   image->elf_he

[PATCH v26 4/8] crash: memory and CPU hotplug sysfs attributes

2023-08-04 Thread Eric DeVolder
Introduce the crash_hotplug attribute for memory and CPUs for
use by userspace.  These attributes directly facilitate the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/memory/memory81
  looking at device '/devices/system/memory/memory81':
KERNEL=="memory81"
SUBSYSTEM=="memory"
DRIVER==""
ATTR{online}=="1"
ATTR{phys_device}=="0"
ATTR{phys_index}=="0051"
ATTR{removable}=="1"
ATTR{state}=="online"
ATTR{valid_zones}=="Movable"

  looking at parent device '/devices/system/memory':
KERNELS=="memory"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{auto_online_blocks}=="offline"
ATTRS{block_size_bytes}=="800"
ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
  looking at device '/devices/system/cpu/cpu0':
KERNEL=="cpu0"
SUBSYSTEM=="cpu"
DRIVER=="processor"
ATTR{crash_notes}=="277c38600"
ATTR{crash_notes_size}=="368"
ATTR{online}=="1"

  looking at parent device '/devices/system/cpu':
KERNELS=="cpu"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{crash_hotplug}=="1"
ATTRS{isolated}==""
ATTRS{kernel_max}=="8191"
ATTRS{nohz_full}=="  (null)"
ATTRS{offline}=="4-7"
ATTRS{online}=="0-3"
ATTRS{possible}=="0-7"
ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

 # The kernel updates the crash elfcorehdr for CPU and memory changes
 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
 SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules
test if crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule skips
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 Documentation/ABI/testing/sysfs-devices-memory |  8 
 .../ABI/testing/sysfs-devices-system-cpu   |  8 
 .../admin-guide/mm/memory-hotplug.rst  |  8 
 Documentation/core-api/cpu_hotplug.rst | 18 ++
 drivers/base/cpu.c | 13 +
 drivers/base/memory.c  | 13 +
 include/linux/kexec.h  |  8 
 7 files changed, 76 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-memory 
b/Documentation/ABI/testing/sysfs-devices-memory
index d8b0f80b9e33..a95e0f17c35a 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -110,3 +110,11 @@ Description:
link is created for memory section 9 on node0.
 
/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
+
+What:  /sys/devices/system/memory/crash_hotplug
+Date:  Aug 2023
+Contact:   Linux kernel mailing list 
+Description:
+   (RO) indicates whether or not the kernel directly supports
+   modifying the crash elfcorehdr for memory hot un/plug and/or
+   on/offline changes.
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index ecd585ca2d50..31189da7ef57 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -686,3 +686,11 @@ Description:
(RO) the list of CPUs that are isolated

Re: [PATCH v25 01/10] drivers/base: refactor cpu.c to use .is_visible()

2023-08-03 Thread Eric DeVolder



On 7/21/23 11:32, Eric DeVolder wrote:



On 7/3/23 11:53, Eric DeVolder wrote:



On 7/3/23 08:05, Greg KH wrote:

On Thu, Jun 29, 2023 at 03:21:10PM -0400, Eric DeVolder wrote:

  - the function body of the callback functions are now wrapped with
    IS_ENABLED(); as the callback function must exist now that the
    attribute is always compiled-in (though not necessarily visible).


Why do you need to do this last thing?  Is it a code savings goal?  Or
something else?  The file will not be present in the system if the
option is not enabled, so it should be safe to not do this unless you
feel it's necessary for some reason?


To accommodate the request, all DEVICE_ATTR() must be unconditionally present in this file. The 
DEVICE_ATTR() requires the .show() callback. As the callback is referenced from a data structure, 
the callback has to be present for link. All the callbacks for these attributes are in this file.


I have two basic choices for gutting the function body if the config feature is not enabled. I can 
either use #ifdef or IS_ENABLED(). Thomas has made it clear I need to use IS_ENABLED(). I can 
certainly use #ifdef (which is what I did in v24).




Not doing this would make the diff easier to read :)


I agree this is messy. I'm not really sure what this request/effort achieves as these attributes 
are not strongly related (unlike cacheinfo) and the way the file was before results in less code.


At any rate, please indicate if you'd rather I use #ifdef.
Thanks for your time!
eric



thanks,

greg k-h


Hi Greg,
I was wondering if you might weigh-in so that I can proceed.

I think there are three options on the table:
- use #ifdef to comment out these function bodies, which keeps the diff much 
more readable
- use IS_ENABLED() as Thomas has requested I do, but makes the diff more 
difficult to read
- remove this refactor altogether, perhaps post-poning until after this crash hotplug series merges, 
as this refactor is largely unrelated to crash hotplug.


Thank you for your time on this topic!
eric


Hi Greg,
If you have an opinion on how to proceed, please provide.
Thanks,
eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v25 01/10] drivers/base: refactor cpu.c to use .is_visible()

2023-07-21 Thread Eric DeVolder



On 7/3/23 11:53, Eric DeVolder wrote:



On 7/3/23 08:05, Greg KH wrote:

On Thu, Jun 29, 2023 at 03:21:10PM -0400, Eric DeVolder wrote:

  - the function body of the callback functions are now wrapped with
    IS_ENABLED(); as the callback function must exist now that the
    attribute is always compiled-in (though not necessarily visible).


Why do you need to do this last thing?  Is it a code savings goal?  Or
something else?  The file will not be present in the system if the
option is not enabled, so it should be safe to not do this unless you
feel it's necessary for some reason?


To accommodate the request, all DEVICE_ATTR() must be unconditionally present in this file. The 
DEVICE_ATTR() requires the .show() callback. As the callback is referenced from a data structure, 
the callback has to be present for link. All the callbacks for these attributes are in this file.


I have two basic choices for gutting the function body if the config feature is not enabled. I can 
either use #ifdef or IS_ENABLED(). Thomas has made it clear I need to use IS_ENABLED(). I can 
certainly use #ifdef (which is what I did in v24).




Not doing this would make the diff easier to read :)


I agree this is messy. I'm not really sure what this request/effort achieves as these attributes are 
not strongly related (unlike cacheinfo) and the way the file was before results in less code.


At any rate, please indicate if you'd rather I use #ifdef.
Thanks for your time!
eric



thanks,

greg k-h


Hi Greg,
I was wondering if you might weigh-in so that I can proceed.

I think there are three options on the table:
- use #ifdef to comment out these function bodies, which keeps the diff much 
more readable
- use IS_ENABLED() as Thomas has requested I do, but makes the diff more 
difficult to read
- remove this refactor altogether, perhaps post-poning until after this crash hotplug series merges, 
as this refactor is largely unrelated to crash hotplug.


Thank you for your time on this topic!
eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v25 06/10] crash: memory and CPU hotplug sysfs attributes

2023-07-03 Thread Eric DeVolder




On 7/3/23 08:07, Greg KH wrote:

On Thu, Jun 29, 2023 at 03:21:15PM -0400, Eric DeVolder wrote:

+What:  /sys/devices/system/cpu/crash_hotplug
+Date:  Jun 2023


It's not "Jun" anymore :(


+Contact:   Linux kernel mailing list 


Why are you not going to maintain this?  Why is this up to me?

thanks,

greg k-h

My apologies, I'll correct both in the next posting.
Thanks!
eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v25 01/10] drivers/base: refactor cpu.c to use .is_visible()

2023-07-03 Thread Eric DeVolder




On 7/3/23 08:05, Greg KH wrote:

On Thu, Jun 29, 2023 at 03:21:10PM -0400, Eric DeVolder wrote:

  - the function body of the callback functions are now wrapped with
IS_ENABLED(); as the callback function must exist now that the
attribute is always compiled-in (though not necessarily visible).


Why do you need to do this last thing?  Is it a code savings goal?  Or
something else?  The file will not be present in the system if the
option is not enabled, so it should be safe to not do this unless you
feel it's necessary for some reason?


To accommodate the request, all DEVICE_ATTR() must be unconditionally present in this file. The 
DEVICE_ATTR() requires the .show() callback. As the callback is referenced from a data structure, 
the callback has to be present for link. All the callbacks for these attributes are in this file.


I have two basic choices for gutting the function body if the config feature is not enabled. I can 
either use #ifdef or IS_ENABLED(). Thomas has made it clear I need to use IS_ENABLED(). I can 
certainly use #ifdef (which is what I did in v24).




Not doing this would make the diff easier to read :)


I agree this is messy. I'm not really sure what this request/effort achieves as these attributes are 
not strongly related (unlike cacheinfo) and the way the file was before results in less code.


At any rate, please indicate if you'd rather I use #ifdef.
Thanks for your time!
eric



thanks,

greg k-h


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v25 06/10] crash: memory and CPU hotplug sysfs attributes

2023-06-29 Thread Eric DeVolder

Randy,
Thanks for looking at this! Inline comments below.
eric

On 6/29/23 15:59, Randy Dunlap wrote:

Hi--

On 6/29/23 12:21, Eric DeVolder wrote:


Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
  Documentation/ABI/testing/sysfs-devices-memory |  8 
  .../ABI/testing/sysfs-devices-system-cpu   |  8 
  .../admin-guide/mm/memory-hotplug.rst  |  8 
  Documentation/core-api/cpu_hotplug.rst | 18 ++
  drivers/base/cpu.c | 16 ++--
  drivers/base/memory.c  | 13 +
  include/linux/kexec.h  |  8 
  7 files changed, 77 insertions(+), 2 deletions(-)




diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst 
b/Documentation/admin-guide/mm/memory-hotplug.rst
index 1b02fe5807cc..eb99d79223a3 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -291,6 +291,14 @@ The following files are currently defined:
   Availability depends on the CONFIG_ARCH_MEMORY_PROBE
   kernel configuration option.
  ``uevent``   read-write: generic udev file for device subsystems.
+``crash_hotplug``  read-only: when changes to the system memory map
+  occur due to hot un/plug of memory, this file contains
+  '1' if the kernel updates the kdump capture kernel memory
+  map itself (via elfcorehdr), or '0' if userspace must 
update
+  the kdump capture kernel memory map.
+
+  Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
+  configuration option.
  == 
=


Did you test build the documentation?
It looks to me like the end-of-table '=' signs line needs 3 more === to be long
enough for the text above it.


Hmm, the 'make htmldocs' renders and views ok. Is there perhaps another method 
I should use?



  
  .. note::

diff --git a/Documentation/core-api/cpu_hotplug.rst 
b/Documentation/core-api/cpu_hotplug.rst
index e6f5bc39cf5c..54581c501562 100644
--- a/Documentation/core-api/cpu_hotplug.rst
+++ b/Documentation/core-api/cpu_hotplug.rst
@@ -741,6 +741,24 @@ will receive all events. A script like::
  
  can process the event further.
  
+When changes to the CPUs in the system occur, the sysfs file

+/sys/devices/system/cpu/crash_hotplug contains '1' if the kernel
+updates the kdump capture kernel list of CPUs itself (via elfcorehdr),
+or '0' if userspace must update the kdump capture kernel list of CPUs.
+
+The availability depends on the CONFIG_HOTPLUG_CPU kernel configuration
+option.
+
+To skip userspace processing of CPU hot un/plug events for kdump
+(ie the unload-then-reload to obtain a current list of CPUs), this sysfs


 i.e.


got it, thanks.


+file can be used in a udev rule as follows:
+
+ SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
+
+For a cpu hot un/plug event, if the architecture supports kernel updates


  CPU
for consistency


got it, thanks.


+of the elfcorehdr (which contains the list of CPUs), then the rule skips
+the unload-then-reload of the kdump capture kernel.
+
  Kernel Inline Documentations Reference
  ==
  


Thanks.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v25 08/10] crash: hotplug support for kexec_load()

2023-06-29 Thread Eric DeVolder
The hotplug support for kexec_load() requires changes to the
userspace kexec-tools and a little extra help from the kernel.

Given a kdump capture kernel loaded via kexec_load(), and a
subsequent hotplug event, the crash hotplug handler finds the
elfcorehdr and rewrites it to reflect the hotplug change.
That is the desired outcome, however, at kernel panic time,
the purgatory integrity check fails (because the elfcorehdr
changed), and the capture kernel does not boot and no vmcore
is generated.

Therefore, the userspace kexec-tools/kexec must indicate to the
kernel that the elfcorehdr can be modified (because the kexec
excluded the elfcorehdr from the digest, and sized the elfcorehdr
memory buffer appropriately).

To facilitate hotplug support with kexec_load():
 - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is
   safe for the kernel to modify the kexec_load()'d elfcorehdr
 - the /sys/kernel/crash_elfcorehdr_size node communicates the
   preferred size of the elfcorehdr memory buffer
 - The sysfs crash_hotplug nodes (ie.
   /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically
   take into account kexec_file_load() vs kexec_load() and
   KEXEC_UPDATE_ELFCOREHDR.
   This is critical so that the udev rule processing of crash_hotplug
   is all that is needed to determine if the userspace unload-then-load
   of the kdump image is to be skipped, or not. The proposed udev
   rule change looks like:
   # The kernel updates the crash elfcorehdr for CPU and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

The table below indicates the behavior of kexec_load()'d kdump image
updates (with the new udev crash_hotplug rule in place):

 Kernel |Kexec
 ---+-+
 Old|Old  |New
|  a  | a
 ---+-+
 New|  a  | b
 ---+-+

where kexec 'old' and 'new' delineate kexec-tools has the needed
modifications for the crash hotplug feature, and kernel 'old' and
'new' delineate the kernel supports this crash hotplug feature.

Behavior 'a' indicates the unload-then-reload of the entire kdump
image. For the kexec 'old' column, the unload-then-reload occurs
due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel
(with 'new' kexec) does not present the crash_hotplug sysfs node,
which leads to the unload-then-reload of the kdump image.

Behavior 'b' indicates the desired optimized behavior of the kernel
directly modifying the elfcorehdr and avoiding the unload-then-reload
of the kdump image.

If the udev rule is not updated with crash_hotplug node check, then
no matter any combination of kernel or kexec is new or old, the
kdump image continues to be unload-then-reload on hotplug changes.

To fully support crash hotplug feature, there needs to be a rollout
of kernel, kexec-tools and udev rule changes. However, the order of
the rollout of these pieces does not matter; kexec_load()'d kdump
images still function for hotplug as-is.

Suggested-by: Hari Bathini 
Signed-off-by: Eric DeVolder 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/include/asm/kexec.h | 11 +++
 arch/x86/kernel/crash.c  | 27 +++
 include/linux/kexec.h| 14 --
 include/uapi/linux/kexec.h   |  1 +
 kernel/Kconfig.kexec |  4 
 kernel/crash_core.c  | 31 +++
 kernel/kexec.c   |  5 +
 kernel/ksysfs.c  | 15 +++
 8 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage 
*image);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(void);
+#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static inline int crash_hotplug_memory_support(void) { return 1; }
-#define crash_hotplug_memory_support crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(void);
+#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
 #endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c70a111c44fa..caf22bcb61af 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -427,6 +427,33 @@ int crash_load_segments(struct kimage *image)
 #undef pr_fmt
 #define pr_fmt(fmt) "cras

[PATCH v25 06/10] crash: memory and CPU hotplug sysfs attributes

2023-06-29 Thread Eric DeVolder
Introduce the crash_hotplug attribute for memory and CPUs for
use by userspace.  These attributes directly facilitate the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/memory/memory81
  looking at device '/devices/system/memory/memory81':
KERNEL=="memory81"
SUBSYSTEM=="memory"
DRIVER==""
ATTR{online}=="1"
ATTR{phys_device}=="0"
ATTR{phys_index}=="0051"
ATTR{removable}=="1"
ATTR{state}=="online"
ATTR{valid_zones}=="Movable"

  looking at parent device '/devices/system/memory':
KERNELS=="memory"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{auto_online_blocks}=="offline"
ATTRS{block_size_bytes}=="800"
ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
  looking at device '/devices/system/cpu/cpu0':
KERNEL=="cpu0"
SUBSYSTEM=="cpu"
DRIVER=="processor"
ATTR{crash_notes}=="277c38600"
ATTR{crash_notes_size}=="368"
ATTR{online}=="1"

  looking at parent device '/devices/system/cpu':
KERNELS=="cpu"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{crash_hotplug}=="1"
ATTRS{isolated}==""
ATTRS{kernel_max}=="8191"
ATTRS{nohz_full}=="  (null)"
ATTRS{offline}=="4-7"
ATTRS{online}=="0-3"
ATTRS{possible}=="0-7"
ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

 # The kernel updates the crash elfcorehdr for CPU and memory changes
 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
 SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules
test if crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule skips
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 Documentation/ABI/testing/sysfs-devices-memory |  8 
 .../ABI/testing/sysfs-devices-system-cpu   |  8 
 .../admin-guide/mm/memory-hotplug.rst  |  8 
 Documentation/core-api/cpu_hotplug.rst | 18 ++
 drivers/base/cpu.c | 16 ++--
 drivers/base/memory.c  | 13 +
 include/linux/kexec.h  |  8 
 7 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-memory 
b/Documentation/ABI/testing/sysfs-devices-memory
index d8b0f80b9e33..c50725ebebb7 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -110,3 +110,11 @@ Description:
link is created for memory section 9 on node0.
 
/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
+
+What:  /sys/devices/system/cpu/crash_hotplug
+Date:  Jun 2023
+Contact:   Linux kernel mailing list 
+Description:
+   (RO) indicates whether or not the kernel directly supports
+   modifying the crash elfcorehdr for memory hot un/plug and/or
+   on/offline changes.
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index ecd585ca2d50..598b0fa67481 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -686,3 +686,11 @@ Description:
(RO) the list of C

[PATCH v25 01/10] drivers/base: refactor cpu.c to use .is_visible()

2023-06-29 Thread Eric DeVolder
Greg Kroah-Hartman requested that this file use the .is_visible()
method instead of #ifdefs for the attributes in cpu.c.

 static struct attribute *cpu_root_attrs[] = {
 #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
_attr_probe.attr,
_attr_release.attr,
 #endif
_attrs[0].attr.attr,
_attrs[1].attr.attr,
_attrs[2].attr.attr,
_attr_kernel_max.attr,
_attr_offline.attr,
_attr_isolated.attr,
 #ifdef CONFIG_NO_HZ_FULL
_attr_nohz_full.attr,
 #endif
 #ifdef CONFIG_GENERIC_CPU_AUTOPROBE
_attr_modalias.attr,
 #endif
NULL
 };

To that end:
 - the .is_visible() method is implemented, and IS_ENABLED(), rather
   than #ifdef, is used to determine the visibility of the attribute.
 - the DEVICE_ATTR() attributes are moved outside of #ifdefs, so that
   those structs are always present for the cpu_root_attrs[].
 - the function body of the callback functions are now wrapped with
   IS_ENABLED(); as the callback function must exist now that the
   attribute is always compiled-in (though not necessarily visible).

No functionality change intended.

Signed-off-by: Eric DeVolder 
---
 drivers/base/cpu.c   | 125 +++
 include/linux/tick.h |   2 +-
 2 files changed, 81 insertions(+), 46 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index c1815b9dae68..2455cbcebc87 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -82,24 +82,27 @@ void unregister_cpu(struct cpu *cpu)
per_cpu(cpu_sys_devices, logical_cpu) = NULL;
return;
 }
+#endif /* CONFIG_HOTPLUG_CPU */
 
-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 static ssize_t cpu_probe_store(struct device *dev,
   struct device_attribute *attr,
   const char *buf,
   size_t count)
 {
-   ssize_t cnt;
-   int ret;
+   if (IS_ENABLED(CONFIG_ARCH_CPU_PROBE_RELEASE)) {
+   ssize_t cnt;
+   int ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   ret = lock_device_hotplug_sysfs();
+   if (ret)
+   return ret;
 
-   cnt = arch_cpu_probe(buf, count);
+   cnt = arch_cpu_probe(buf, count);
 
-   unlock_device_hotplug();
-   return cnt;
+   unlock_device_hotplug();
+   return cnt;
+   }
+   return 0;
 }
 
 static ssize_t cpu_release_store(struct device *dev,
@@ -107,23 +110,24 @@ static ssize_t cpu_release_store(struct device *dev,
 const char *buf,
 size_t count)
 {
-   ssize_t cnt;
-   int ret;
+   if (IS_ENABLED(CONFIG_ARCH_CPU_PROBE_RELEASE)) {
+   ssize_t cnt;
+   int ret;
 
-   ret = lock_device_hotplug_sysfs();
-   if (ret)
-   return ret;
+   ret = lock_device_hotplug_sysfs();
+   if (ret)
+   return ret;
 
-   cnt = arch_cpu_release(buf, count);
+   cnt = arch_cpu_release(buf, count);
 
-   unlock_device_hotplug();
-   return cnt;
+   unlock_device_hotplug();
+   return cnt;
+   }
+   return 0;
 }
 
 static DEVICE_ATTR(probe, S_IWUSR, NULL, cpu_probe_store);
 static DEVICE_ATTR(release, S_IWUSR, NULL, cpu_release_store);
-#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
-#endif /* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_KEXEC
 #include 
@@ -273,14 +277,14 @@ static ssize_t print_cpus_isolated(struct device *dev,
 }
 static DEVICE_ATTR(isolated, 0444, print_cpus_isolated, NULL);
 
-#ifdef CONFIG_NO_HZ_FULL
 static ssize_t print_cpus_nohz_full(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(tick_nohz_full_mask));
+   if (IS_ENABLED(CONFIG_NO_HZ_FULL))
+   return sysfs_emit(buf, "%*pbl\n", 
cpumask_pr_args(tick_nohz_full_mask));
+   return 0;
 }
 static DEVICE_ATTR(nohz_full, 0444, print_cpus_nohz_full, NULL);
-#endif
 
 static void cpu_device_release(struct device *dev)
 {
@@ -301,30 +305,32 @@ static void cpu_device_release(struct device *dev)
 */
 }
 
-#ifdef CONFIG_GENERIC_CPU_AUTOPROBE
 static ssize_t print_cpu_modalias(struct device *dev,
  struct device_attribute *attr,
  char *buf)
 {
int len = 0;
-   u32 i;
-
-   len += sysfs_emit_at(buf, len,
-"cpu:type:" CPU_FEATURE_TYPEFMT ":feature:",
-CPU_FEATURE_TYPEVAL);
-
-   for (i = 0; i < MAX_CPU_FEATURES; i++)
-   if (cpu_have_feature(i)) {
-   if (len + sizeof(",\n") >= PAGE_SIZE) {
-   WARN(1, "CPU features overflow page\n");
-   

[PATCH v25 07/10] x86/crash: add x86 crash hotplug support

2023-06-29 Thread Eric DeVolder
When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

A new elfcorehdr is generated from the available CPUs and memory
and replaces the existing elfcorehdr. The segment containing the
elfcorehdr is identified at run-time in
crash_core:crash_handle_hotplug_event().

No modifications to purgatory (see 'kexec: exclude elfcorehdr
from the segment digest') or boot_params (as the elfcorehdr=
capture kernel command line parameter pointer remains unchanged
and correct) are needed, just elfcorehdr.

For kexec_file_load(), the elfcorehdr segment size is based on
NR_CPUS and CRASH_MAX_MEMORY_RANGES in order to accommodate a
growing number of CPU and memory resources.

For kexec_load(), the userspace kexec utility needs to size the
elfcorehdr segment in the same/similar manner.

To accommodate kexec_load() syscall in the absence of
kexec_file_load() syscall support, prepare_elf_headers() and
dependents are moved outside of CONFIG_KEXEC_FILE.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/Kconfig |   3 +
 arch/x86/include/asm/kexec.h |  15 +
 arch/x86/kernel/crash.c  | 103 ---
 3 files changed, 114 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 06a4472d0fc0..42c083da7ce4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2058,6 +2058,9 @@ config ARCH_SUPPORTS_KEXEC_JUMP
 config ARCH_SUPPORTS_CRASH_DUMP
def_bool X86_64 || (X86_32 && HIGHMEM)
 
+config ARCH_SUPPORTS_CRASH_HOTPLUG
+   def_bool y
+
 config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || 
CRASH_DUMP)
default "0x100"
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5b77bbc28f96..9143100ea3ea 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -209,6 +209,21 @@ typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..c70a111c44fa 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -158,8 +158,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
crash_save_cpu(regs, safe_smp_processor_id());
 }
 
-#ifdef CONFIG_KEXEC_FILE
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -231,7 +229,7 @@ static int prepare_elf64_ram_headers_callback(struct 
resource *res, void *arg)
 
 /* Prepare elf headers. Return addr and size */
 static int prepare_elf_headers(struct kimage *image, void **addr,
-   unsigned long *sz)
+   unsigned long *sz, unsigned long 
*nr_mem_ranges)
 {
struct crash_mem *cmem;
int ret;
@@ -249,6 +247,9 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
if (ret)
goto out;
 
+   /* Return the computed number of memory ranges, for hotplug usage */
+   *nr_mem_ranges = cmem->nr_ranges;
+
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), 
addr, sz);
 
@@ -257,6 +258,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
return ret;
 }
 
+#ifdef CONFIG_KEXEC_FILE
 static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
 {
unsigned int nr_e820_entries;
@@ -371,18 +373,42 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
 int crash_load_segments(struct kimage *image)
 {
int ret;
+   unsigned long pnum = 0;
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
  .buf_max = ULONG_MAX, .top_down = false };
 
/* Prepare elf headers and add a segment */
-   ret = prepare_elf_headers(image, , );
+   ret = prepare_elf_headers(image, , , );
if (ret)
return ret;
 
-   image->elf_headers = kbuf.buffer;
-   image->elf_headers_sz = kbuf.bufsz;
+   image->elf_headers  = kbuf.buffer;
+   image->elf_he

[PATCH v25 02/10] drivers/base: refactor memory.c to use .is_visible()

2023-06-29 Thread Eric DeVolder
Greg Kroah-Hartman requested that this file use the .is_visible()
method instead of #ifdefs for the attributes in memory.c.

 static struct attribute *memory_memblk_attrs[] = {
 _attr_phys_index.attr,
 _attr_state.attr,
 _attr_phys_device.attr,
 _attr_removable.attr,
 #ifdef CONFIG_MEMORY_HOTREMOVE
 _attr_valid_zones.attr,
 #endif
 NULL
 };

and

 static struct attribute *memory_root_attrs[] = {
 #ifdef CONFIG_ARCH_MEMORY_PROBE
 _attr_probe.attr,
 #endif

 #ifdef CONFIG_MEMORY_FAILURE
 _attr_soft_offline_page.attr,
 _attr_hard_offline_page.attr,
 #endif

 _attr_block_size_bytes.attr,
 _attr_auto_online_blocks.attr,
 NULL
 };

To that end:
 - the .is_visible() method is implemented, and IS_ENABLED(), rather
   than #ifdef, is used to determine the visibility of the attribute.
 - the DEVICE_ATTR_xx() attributes are moved outside of #ifdefs, so that
   those structs are always present for the memory_memblk_attrs[] and
   memory_root_attrs[].
 - the function body of the callback functions are now wrapped with
   IS_ENABLED(); as the callback function must exist now that the
   attribute is always compiled-in (though not necessarily visible).

No functionality change intended.

Signed-off-by: Eric DeVolder 
---
 drivers/base/memory.c | 229 ++
 1 file changed, 140 insertions(+), 89 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..7294112fe646 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -391,62 +391,66 @@ static ssize_t phys_device_show(struct device *dev,
  arch_get_memory_phys_device(start_pfn));
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 static int print_allowed_zone(char *buf, int len, int nid,
  struct memory_group *group,
  unsigned long start_pfn, unsigned long nr_pages,
  int online_type, struct zone *default_zone)
 {
-   struct zone *zone;
+   if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) {
+   struct zone *zone;
 
-   zone = zone_for_pfn_range(online_type, nid, group, start_pfn, nr_pages);
-   if (zone == default_zone)
-   return 0;
+   zone = zone_for_pfn_range(online_type, nid, group, start_pfn, 
nr_pages);
+   if (zone == default_zone)
+   return 0;
 
-   return sysfs_emit_at(buf, len, " %s", zone->name);
+   return sysfs_emit_at(buf, len, " %s", zone->name);
+   }
+   return 0;
 }
 
 static ssize_t valid_zones_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   struct memory_block *mem = to_memory_block(dev);
-   unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
-   unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-   struct memory_group *group = mem->group;
-   struct zone *default_zone;
-   int nid = mem->nid;
-   int len = 0;
+   if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) {
+   struct memory_block *mem = to_memory_block(dev);
+   unsigned long start_pfn = 
section_nr_to_pfn(mem->start_section_nr);
+   unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+   struct memory_group *group = mem->group;
+   struct zone *default_zone;
+   int nid = mem->nid;
+   int len = 0;
 
-   /*
-* Check the existing zone. Make sure that we do that only on the
-* online nodes otherwise the page_zone is not reliable
-*/
-   if (mem->state == MEM_ONLINE) {
/*
-* If !mem->zone, the memory block spans multiple zones and
-* cannot get offlined.
-*/
-   default_zone = mem->zone;
-   if (!default_zone)
-   return sysfs_emit(buf, "%s\n", "none");
-   len += sysfs_emit_at(buf, len, "%s", default_zone->name);
-   goto out;
-   }
+   * Check the existing zone. Make sure that we do that only on the
+   * online nodes otherwise the page_zone is not reliable
+   */
+   if (mem->state == MEM_ONLINE) {
+   /*
+* If !mem->zone, the memory block spans multiple zones 
and
+* cannot get offlined.
+*/
+   default_zone = mem->zone;
+   if (!default_zone)
+   return sysfs_emit(buf, "%s\n", "none");
+   len += sysfs_emit_at(buf, len, "%s", 
default_zone->name);
+   goto out;
+   }
 
-   default_zone = zone_for

[PATCH v25 00/10] crash: Kernel handling of CPU and memory hot un/plug

2023-06-29 Thread Eric DeVolder
://lkml.org/lkml/2022/5/5/1133
 https://lore.kernel.org/lkml/20220505184603.1548-1-eric.devol...@oracle.com/
 - Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
   of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
   is not needed. Also use of IS_ENABLED() rather than #ifdef's.
   Renamed crash_hotplug_handler() to handle_hotplug_event().
   And other corrections.
 - Per Baoquan, minimized the parameters to the arch_crash_
   handle_hotplug_event() to hp_action and cpu.
 - Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
 - Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
   to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
   by David Hildebrand. Folded this patch into the x86
   kexec_file_load support patch.

v7: 13apr2022
 https://lkml.org/lkml/2022/4/13/850
 https://lore.kernel.org/lkml/20220413164237.20845-1-eric.devol...@oracle.com/
 - Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
 https://lkml.org/lkml/2022/4/1/1203
 https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devol...@oracle.com/
 - Reword commit messages and some comment cleanup per Baoquan.
 - Changed elf_index to elfcorehdr_index for clarity.
 - Minor code changes per Baoquan.

v5: 3mar2022
 https://lkml.org/lkml/2022/3/3/674
 https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/
 - Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
   David Hildenbrand.
 - Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
 https://lkml.org/lkml/2022/2/9/1406
 https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/
 - Refactored patches per Baoquan suggestsions.
 - A few corrections, per Baoquan.

v3: 10jan2022
 https://lkml.org/lkml/2022/1/10/1212
 https://lore.kernel.org/lkml/20220110195727.1682-1-eric.devol...@oracle.com/
 - Rebasing per Baoquan He request.
 - Changed memory notifier per David Hildenbrand.
 - Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
 https://lkml.org/lkml/2021/12/7/1088
 https://lore.kernel.org/lkml/20211207195204.1582-1-eric.devol...@oracle.com/
 - Acting upon Baoquan He suggestion of removing elfcorehdr from
   the purgatory list of segments, removed purgatory code from
   patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
 https://lkml.org/lkml/2021/11/18/845
 https://lore.kernel.org/lkml/2028174948.37435-1-eric.devol...@oracle.com/
 - working patchset demonstrating kernel handling of hotplug
   updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
 https://lkml.org/lkml/2020/12/14/532
 https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d90...@oracle.com/
 - proposed concept of allowing kernel to handle hotplug update
   of elfcorehdr
---


Eric DeVolder (10):
  drivers/base: refactor cpu.c to use .is_visible()
  drivers/base: refactor memory.c to use .is_visible()
  crash: move a few code bits to setup support of crash hotplug
  crash: add generic infrastructure for crash hotplug support
  kexec: exclude elfcorehdr from the segment digest
  crash: memory and CPU hotplug sysfs attributes
  x86/crash: add x86 crash hotplug support
  crash: hotplug support for kexec_load()
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  x86/crash: optimize CPU changes

 .../ABI/testing/sysfs-devices-memory  |   8 +
 .../ABI/testing/sysfs-devices-system-cpu  |   8 +
 .../admin-guide/mm/memory-hotplug.rst |   8 +
 Documentation/core-api/cpu_hotplug.rst|  18 +
 arch/x86/Kconfig  |   3 +
 arch/x86/include/asm/kexec.h  |  18 +
 arch/x86/kernel/crash.c   | 140 ++-
 drivers/base/cpu.c| 141 ---
 drivers/base/memory.c | 242 +++-
 include/linux/crash_core.h|   9 +
 include/linux/kexec.h |  63 +++-
 include/linux/tick.h  |   2 +-
 include/uapi/linux/kexec.h|   1 +
 kernel/Kconfig.kexec  |  35 ++
 kernel/crash_core.c   | 355 ++
 kernel/kexec.c|   5 +
 kernel/kexec_core.c   |   6 +
 kernel/kexec_file.c   | 187 +
 kernel/ksysfs.c   |  15 +
 19 files changed, 922 insertions(+), 342 deletions(-)

-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v25 04/10] crash: add generic infrastructure for crash hotplug support

2023-06-29 Thread Eric DeVolder
To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining). The crash elfcorehdr describes the CPUs and memory to
be written into the vmcore.

To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE
group, just prior to the STARTING group, which is very close to the
CPU starting up in a plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate. Note that for a CPU being unplugged
or offlined, the CPU will still be present in the list of CPUs
generated by crash_prepare_elf64_headers(). However, there is no
need to explicitly omit the CPU, see justification in
'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'.

To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/crash_core.h |   9 +++
 include/linux/kexec.h  |  11 +++
 kernel/Kconfig.kexec   |  31 
 kernel/crash_core.c| 142 +
 kernel/kexec_core.c|   6 ++
 5 files changed, 199 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..e14345cc7a22 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long 
system_ram,
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
 
+#define KEXEC_CRASH_HP_NONE0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_CPU  2
+#define KEXEC_CRASH_HP_ADD_MEMORY  3
+#define KEXEC_CRASH_HP_REMOVE_MEMORY   4
+#define KEXEC_CRASH_HP_INVALID_CPU -1U
+
+struct kimage;
+
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 811a90e09698..b9903dd48e24 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Verify architecture specific macros are defined */
@@ -360,6 +361,12 @@ struct kimage {
struct purgatory_info purgatory_info;
 #endif
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   int hp_action;
+   int elfcorehdr_index;
+   bool elfcorehdr_updated;
+#endif
+
 #ifdef CONFIG_IMA_KEXEC
/* Virtual address of IMA measurement buffer for kexec syscall */
void *ima_buffer;
@@ -490,6 +497,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, 
unsigned int pages, g
 static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) 
{ }
 #endif
 
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index 5d576ddfd999..7eb42a795176 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -107,4 +107,35 @@ config CRASH_DUMP
  For s390, this option also enables zfcpdump.
  See also 
 
+config CRASH_HOTPLUG
+   bool "Update the crash elfcorehdr on system configuration changes"
+   default y
+   depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+   depends on ARCH_SUPPORTS_CRASH_HOTPLUG
+   help
+ Enable direct update to the crash elfcorehdr (which contains
+ the list of CPUs and memory regions to be dumped upon a crash)
+ in response to hot plug/unplug or online/offline of CPUs or
+ memory. This is a much more advanced approach than userspace
+ attempting that.
+
+ If unsure, say Y.
+
+config CRASH_MAX_MEMORY_RANGES
+   int "Specify the maximum number of memory regions for the elfcorehdr"
+   default 8192
+   depends on CRASH_HOTPLUG
+   help
+

[PATCH v25 09/10] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()

2023-06-29 Thread Eric DeVolder
The function crash_prepare_elf64_headers() generates the elfcorehdr
which describes the CPUs and memory in the system for the crash kernel.
In particular, it writes out ELF PT_NOTEs for memory regions and the
CPUs in the system.

With respect to the CPUs, the current implementation utilizes
for_each_present_cpu() which means that as CPUs are added and removed,
the elfcorehdr must again be updated to reflect the new set of CPUs.

The reasoning behind the move to use for_each_possible_cpu(), is:

- At kernel boot time, all percpu crash_notes are allocated for all
  possible CPUs; that is, crash_notes are not allocated dynamically
  when CPUs are plugged/unplugged. Thus the crash_notes for each
  possible CPU are always available.

- The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU.
  Changing to for_each_possible_cpu() is valid as the crash_notes
  pointed to by each CPU PT_NOTE are present and always valid.

Furthermore, examining a common crash processing path of:

 kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
   elfcorehdr  /proc/vmcore vmcore

reveals how the ELF CPU PT_NOTEs are utilized:

- Upon panic, each CPU is sent an IPI and shuts itself down, recording
 its state in its crash_notes. When all CPUs are shutdown, the
 crash kernel is launched with a pointer to the elfcorehdr.

- The crash kernel via linux/fs/proc/vmcore.c does not examine or
 use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.

- The makedumpfile utility uses /proc/vmcore and reads the CPU
 PT_NOTEs to craft a nr_cpus variable, which is reported in a
 header but otherwise generally unused. Makedumpfile creates the
 vmcore.

- The 'crash' dump analyzer does not appear to reference the CPU
 PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
 symbols and directly examines those structure contents from vmcore
 memory. From that information it is able to determine which CPUs
 are present and online, and locate the corresponding crash_notes.
 Said differently, it appears that 'crash' analyzer does not rely
 on the ELF PT_NOTEs for CPUs; rather it obtains the information
 directly via kernel symbols and the memory within the vmcore.

(There maybe other vmcore generating and analysis tools that do use
these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
common solution.)

This results in the benefit of having all CPUs described in the
elfcorehdr, and therefore reducing the need to re-generate the
elfcorehdr on CPU changes, at the small expense of an additional
56 bytes per PT_NOTE for not-present-but-possible CPUs.

On systems where kexec_file_load() syscall is utilized, all the above
is valid. On systems where kexec_load() syscall is utilized, there
may be the need for the elfcorehdr to be regenerated once. The reason
being that some archs only populate the 'present' CPUs from the
/sys/devices/system/cpus entries, which the userspace 'kexec' utility
uses to generate the userspace-supplied elfcorehdr. In this situation,
one memory or CPU change will rewrite the elfcorehdr via the
crash_prepare_elf64_headers() function and now all possible CPUs will
be described, just as with kexec_file_load() syscall.

Suggested-by: Sourabh Jain 
Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/crash_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index fa918176d46d..7378b501fada 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
ehdr->e_ehsize = sizeof(Elf64_Ehdr);
ehdr->e_phentsize = sizeof(Elf64_Phdr);
 
-   /* Prepare one phdr of type PT_NOTE for each present CPU */
-   for_each_present_cpu(cpu) {
+   /* Prepare one phdr of type PT_NOTE for each possible CPU */
+   for_each_possible_cpu(cpu) {
phdr->p_type = PT_NOTE;
notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
phdr->p_offset = phdr->p_paddr = notes_addr;
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v25 03/10] crash: move a few code bits to setup support of crash hotplug

2023-06-29 Thread Eric DeVolder
The crash hotplug support leans on the work for the kexec_file_load()
syscall. To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

No functionality change intended.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/kexec.h |  30 +++
 kernel/crash_core.c   | 182 ++
 kernel/kexec_file.c   | 181 -
 3 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 22b5cd24f581..811a90e09698 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -105,6 +105,21 @@ struct compat_kexec_segment {
 };
 #endif
 
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN   4096
+
+struct crash_mem {
+   unsigned int max_nr_ranges;
+   unsigned int nr_ranges;
+   struct range ranges[];
+};
+
+extern int crash_exclude_mem_range(struct crash_mem *mem,
+  unsigned long long mstart,
+  unsigned long long mend);
+extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
+  void **addr, unsigned long *sz);
+
 #ifdef CONFIG_KEXEC_FILE
 struct purgatory_info {
/*
@@ -230,21 +245,6 @@ static inline int arch_kexec_locate_mem_hole(struct 
kexec_buf *kbuf)
 }
 #endif
 
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN   4096
-
-struct crash_mem {
-   unsigned int max_nr_ranges;
-   unsigned int nr_ranges;
-   struct range ranges[];
-};
-
-extern int crash_exclude_mem_range(struct crash_mem *mem,
-  unsigned long long mstart,
-  unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
-  void **addr, unsigned long *sz);
-
 #ifndef arch_kexec_apply_relocations_add
 /*
  * arch_kexec_apply_relocations_add - apply relocations of type RELA
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..b7c30b748a16 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -314,6 +315,187 @@ static int __init parse_crashkernel_dummy(char *arg)
 }
 early_param("crashkernel", parse_crashkernel_dummy);
 
+int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+ void **addr, unsigned long *sz)
+{
+   Elf64_Ehdr *ehdr;
+   Elf64_Phdr *phdr;
+   unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+   unsigned char *buf;
+   unsigned int cpu, i;
+   unsigned long long notes_addr;
+   unsigned long mstart, mend;
+
+   /* extra phdr for vmcoreinfo ELF note */
+   nr_phdr = nr_cpus + 1;
+   nr_phdr += mem->nr_ranges;
+
+   /*
+* kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+* area (for example, 8000 - a000 on x86_64).
+* I think this is required by tools like gdb. So same physical
+* memory will be mapped in two ELF headers. One will contain kernel
+* text virtual addresses and other will have __va(physical) addresses.
+*/
+
+   nr_phdr++;
+   elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+   elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+   buf = vzalloc(elf_sz);
+   if (!buf)
+   return -ENOMEM;
+
+   ehdr = (Elf64_Ehdr *)buf;
+   phdr = (Elf64_Phdr *)(ehdr + 1);
+   memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+   ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+   ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
+   ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+   ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+   memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
+   ehdr->e_type = ET_CORE;
+   ehdr->e_machine = ELF_ARCH;
+   ehdr->e_version = EV_CURRENT;
+   ehdr->e_phoff = sizeof(Elf64_Ehdr);
+   ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+   ehdr->e_phentsize = sizeof(Elf64_Phdr);
+
+   /* Prepare one phdr of type PT_NOTE for each present CPU */
+   for_each_present_cpu(cpu) {
+   phdr->p_type = PT_NOTE;
+   notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
+   phdr->p_offset = phdr->p_paddr = notes_addr;
+   phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
+   (ehdr->e_phnum)++;
+   phdr++;
+   }
+
+   /* Prepare one PT_NOTE header for vmcoreinfo */
+   phdr->p_type = PT_NOTE;
+   

[PATCH v25 05/10] kexec: exclude elfcorehdr from the segment digest

2023-06-29 Thread Eric DeVolder
When a crash kernel is loaded via the kexec_file_load() syscall, the
kernel places the various segments (ie crash kernel, crash initrd,
boot_params, elfcorehdr, purgatory, etc) in memory. For those
architectures that utilize purgatory, a hash digest of the segments
is calculated for integrity checking. The digest is embedded into
the purgatory image prior to placing in memory.

Updates to the elfcorehdr in response to CPU and memory changes
would cause the purgatory integrity checking to fail (at crash time,
and no vmcore created). Therefore, the elfcorehdr segment is
explicitly excluded from the purgatory digest, enabling updates to
the elfcorehdr while also avoiding the need to recompute the hash
digest and reload purgatory.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/kexec_file.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index e9cf9e8d8f01..824ffc5282f4 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -726,6 +726,12 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
for (j = i = 0; i < image->nr_segments; i++) {
struct kexec_segment *ksegment;
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   /* Exclude elfcorehdr segment to allow future changes via 
hotplug */
+   if (j == image->elfcorehdr_index)
+   continue;
+#endif
+
ksegment = >segment[i];
/*
 * Skip purgatory as it will be modified once we put digest
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v25 10/10] x86/crash: optimize CPU changes

2023-06-29 Thread Eric DeVolder
crash_prepare_elf64_headers() writes into the elfcorehdr an ELF
PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs
(ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr.

The kimage->file_mode term covers kdump images loaded via the
kexec_file_load() syscall. Since crash_prepare_elf64_headers()
wrote the initial elfcorehdr, no update to the elfcorehdr is
needed for CPU changes.

The kimage->elfcorehdr_updated term covers kdump images loaded via
the kexec_load() syscall. At least one memory or CPU change must occur
to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr.
Afterwards, no update to the elfcorehdr is needed for CPU changes.

This code is intentionally *NOT* hoisted into
crash_handle_hotplug_event() as it would prevent the arch-specific
handler from running for CPU changes. This would break PPC, for
example, which needs to update other information besides the
elfcorehdr, on CPU changes.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/kernel/crash.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index caf22bcb61af..18d2a18d1073 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -467,6 +467,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
unsigned long mem, memsz;
unsigned long elfsz = 0;
 
+   /*
+* As crash_prepare_elf64_headers() has already described all
+* possible CPUs, there is no need to update the elfcorehdr
+* for additional CPU changes.
+*/
+   if ((image->file_mode || image->elfcorehdr_updated) &&
+   ((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+   (image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+   return;
+
/*
 * Create the new elfcorehdr reflecting the changes to CPU and/or
 * memory resources.
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v24 01/10] drivers/base: refactor cpu.c to use .is_visible()

2023-06-29 Thread Eric DeVolder

I still need to convert the ifdefs within the functions to IS_ENABLED(), my 
apologies.
eric

On 6/28/23 13:52, Eric DeVolder wrote:

Greg Kroah-Hartman requested that this file use the .is_visible()
method instead of #ifdefs for the attributes in cpu.c.

  static struct attribute *cpu_root_attrs[] = {
  #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 _attr_probe.attr,
 _attr_release.attr,
  #endif
 _attrs[0].attr.attr,
 _attrs[1].attr.attr,
 _attrs[2].attr.attr,
 _attr_kernel_max.attr,
 _attr_offline.attr,
 _attr_isolated.attr,
  #ifdef CONFIG_NO_HZ_FULL
 _attr_nohz_full.attr,
  #endif
  #ifdef CONFIG_GENERIC_CPU_AUTOPROBE
 _attr_modalias.attr,
  #endif
 NULL
  };

To that end:
  - the .is_visible() method is implemented, and IS_ENABLED(), rather
than #ifdef, is used to determine the visibility of the attribute.
  - the DEVICE_ATTR() attributes are moved outside of #ifdefs, so that
those structs are always present for the cpu_root_attrs[].
  - the #ifdefs guarding the attributes in the cpu_root_attrs[] are moved
to the corresponding callback function; as the callback function must
exist now that the attribute is always compiled-in (though not
necessarily visible).

No functionality change intended.

Signed-off-by: Eric DeVolder 
---
  drivers/base/cpu.c | 67 --
  1 file changed, 53 insertions(+), 14 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index c1815b9dae68..75fa46a567a1 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -82,13 +82,14 @@ void unregister_cpu(struct cpu *cpu)
per_cpu(cpu_sys_devices, logical_cpu) = NULL;
return;
  }
+#endif /* CONFIG_HOTPLUG_CPU */
  
-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE

  static ssize_t cpu_probe_store(struct device *dev,
   struct device_attribute *attr,
   const char *buf,
   size_t count)
  {
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
ssize_t cnt;
int ret;
  
@@ -100,6 +101,9 @@ static ssize_t cpu_probe_store(struct device *dev,
  
  	unlock_device_hotplug();

return cnt;
+#else
+   return 0;
+#endif
  }
  
  static ssize_t cpu_release_store(struct device *dev,

@@ -107,6 +111,7 @@ static ssize_t cpu_release_store(struct device *dev,
 const char *buf,
 size_t count)
  {
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
ssize_t cnt;
int ret;
  
@@ -118,12 +123,13 @@ static ssize_t cpu_release_store(struct device *dev,
  
  	unlock_device_hotplug();

return cnt;
+#else
+   return 0;
+#endif
  }
  
  static DEVICE_ATTR(probe, S_IWUSR, NULL, cpu_probe_store);

  static DEVICE_ATTR(release, S_IWUSR, NULL, cpu_release_store);
-#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
-#endif /* CONFIG_HOTPLUG_CPU */
  
  #ifdef CONFIG_KEXEC

  #include 
@@ -273,14 +279,16 @@ static ssize_t print_cpus_isolated(struct device *dev,
  }
  static DEVICE_ATTR(isolated, 0444, print_cpus_isolated, NULL);
  
-#ifdef CONFIG_NO_HZ_FULL

  static ssize_t print_cpus_nohz_full(struct device *dev,
struct device_attribute *attr, char *buf)
  {
+#ifdef CONFIG_NO_HZ_FULL
return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(tick_nohz_full_mask));
+#else
+   return 0;
+#endif
  }
  static DEVICE_ATTR(nohz_full, 0444, print_cpus_nohz_full, NULL);
-#endif
  
  static void cpu_device_release(struct device *dev)

  {
@@ -301,12 +309,12 @@ static void cpu_device_release(struct device *dev)
 */
  }
  
-#ifdef CONFIG_GENERIC_CPU_AUTOPROBE

  static ssize_t print_cpu_modalias(struct device *dev,
  struct device_attribute *attr,
  char *buf)
  {
int len = 0;
+#ifdef CONFIG_GENERIC_CPU_AUTOPROBE
u32 i;
  
  	len += sysfs_emit_at(buf, len,

@@ -322,9 +330,11 @@ static ssize_t print_cpu_modalias(struct device *dev,
len += sysfs_emit_at(buf, len, ",%04X", i);
}
len += sysfs_emit_at(buf, len, "\n");
+#endif
return len;
  }
  
+#ifdef CONFIG_GENERIC_CPU_AUTOPROBE

  static int cpu_uevent(const struct device *dev, struct kobj_uevent_env *env)
  {
char *buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
@@ -451,32 +461,61 @@ struct device *cpu_device_create(struct device *parent, 
void *drvdata,
  }
  EXPORT_SYMBOL_GPL(cpu_device_create);
  
-#ifdef CONFIG_GENERIC_CPU_AUTOPROBE

  static DEVICE_ATTR(modalias, 0444, print_cpu_modalias, NULL);
-#endif
  
  static struct attribute *cpu_root_attrs[] = {

-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
_attr_probe.attr,
_attr_release.attr,
-#endif
_attrs[0].attr.attr,
_attrs[1].attr.attr,
_attrs[2].attr.attr,
_attr_kernel_max.attr,
_attr_offline.attr,
_attr_isolated.a

Re: [PATCH v24 02/10] drivers/base: refactor memory.c to use .is_visible()

2023-06-29 Thread Eric DeVolder

I still need to convert the ifdefs within the functions to IS_ENABLED(), my 
apologies.
eric

On 6/28/23 13:52, Eric DeVolder wrote:

Greg Kroah-Hartman requested that this file use the .is_visible()
method instead of #ifdefs for the attributes in memory.c.

  static struct attribute *memory_memblk_attrs[] = {
  _attr_phys_index.attr,
  _attr_state.attr,
  _attr_phys_device.attr,
  _attr_removable.attr,
  #ifdef CONFIG_MEMORY_HOTREMOVE
  _attr_valid_zones.attr,
  #endif
  NULL
  };

and

  static struct attribute *memory_root_attrs[] = {
  #ifdef CONFIG_ARCH_MEMORY_PROBE
  _attr_probe.attr,
  #endif

  #ifdef CONFIG_MEMORY_FAILURE
  _attr_soft_offline_page.attr,
  _attr_hard_offline_page.attr,
  #endif

  _attr_block_size_bytes.attr,
  _attr_auto_online_blocks.attr,
  NULL
  };

To that end:
  - the .is_visible() method is implemented, and IS_ENABLED(), rather
than #ifdef, is used to determine the visibility of the attribute.
  - the DEVICE_ATTR_xx() attributes are moved outside of #ifdefs, so that
those structs are always present for the memory_memblk_attrs[] and
memory_root_attrs[].
  - the #ifdefs guarding the attributes in the memory_memblk_attrs[] and
memory_root_attrs[] are moved to the corresponding callback function;
as the callback function must exist now that the attribute is always
compiled-in (though not necessarily visible).

No functionality change intended.

Signed-off-by: Eric DeVolder 
---
  drivers/base/memory.c | 78 +++
  1 file changed, 65 insertions(+), 13 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..f03eda7e1c9c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -405,10 +405,12 @@ static int print_allowed_zone(char *buf, int len, int nid,
  
  	return sysfs_emit_at(buf, len, " %s", zone->name);

  }
+#endif
  
  static ssize_t valid_zones_show(struct device *dev,

struct device_attribute *attr, char *buf)
  {
+#ifdef CONFIG_MEMORY_HOTREMOVE
struct memory_block *mem = to_memory_block(dev);
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -444,9 +446,11 @@ static ssize_t valid_zones_show(struct device *dev,
  out:
len += sysfs_emit_at(buf, len, "\n");
return len;
+#else
+   return 0;
+#endif
  }
  static DEVICE_ATTR_RO(valid_zones);
-#endif
  
  static DEVICE_ATTR_RO(phys_index);

  static DEVICE_ATTR_RW(state);
@@ -496,10 +500,10 @@ static DEVICE_ATTR_RW(auto_online_blocks);
   * as well as ppc64 will do all of their discovery in userspace
   * and will require this interface.
   */
-#ifdef CONFIG_ARCH_MEMORY_PROBE
  static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
   const char *buf, size_t count)
  {
+#ifdef CONFIG_ARCH_MEMORY_PROBE
u64 phys_addr;
int nid, ret;
unsigned long pages_per_block = PAGES_PER_SECTION * sections_per_block;
@@ -527,12 +531,13 @@ static ssize_t probe_store(struct device *dev, struct 
device_attribute *attr,
  out:
unlock_device_hotplug();
return ret;
+#else
+   return 0;
+#endif
  }
  
  static DEVICE_ATTR_WO(probe);

-#endif
  
-#ifdef CONFIG_MEMORY_FAILURE

  /*
   * Support for offlining pages of memory
   */
@@ -542,6 +547,7 @@ static ssize_t soft_offline_page_store(struct device *dev,
   struct device_attribute *attr,
   const char *buf, size_t count)
  {
+#ifdef CONFIG_MEMORY_FAILURE
int ret;
u64 pfn;
if (!capable(CAP_SYS_ADMIN))
@@ -551,6 +557,9 @@ static ssize_t soft_offline_page_store(struct device *dev,
pfn >>= PAGE_SHIFT;
ret = soft_offline_page(pfn, 0);
return ret == 0 ? count : ret;
+#else
+   return 0;
+#endif
  }
  
  /* Forcibly offline a page, including killing processes. */

@@ -558,6 +567,7 @@ static ssize_t hard_offline_page_store(struct device *dev,
   struct device_attribute *attr,
   const char *buf, size_t count)
  {
+#ifdef CONFIG_MEMORY_FAILURE
int ret;
u64 pfn;
if (!capable(CAP_SYS_ADMIN))
@@ -569,11 +579,13 @@ static ssize_t hard_offline_page_store(struct device *dev,
if (ret == -EOPNOTSUPP)
ret = 0;
return ret ? ret : count;
+#else
+   return 0;
+#endif
  }
  
  static DEVICE_ATTR_WO(soft_offline_page);

  static DEVICE_ATTR_WO(hard_offline_page);
-#endif
  
  /* See phys_device_show(). */

  int __weak arch_get_memory_phys_device(unsigned long start_pfn)
@@ -611,14 +623,35 @@ static struct attribute *memory_memblk_attrs[] = {
_attr_state.attr,
_attr_phys_device.attr,
_attr_removable.attr,

[PATCH v24 00/10] crash: Kernel handling of CPU and memory hot un/plug

2023-06-28 Thread Eric DeVolder
d crash_hotplug_handler() to handle_hotplug_event().
   And other corrections.
 - Per Baoquan, minimized the parameters to the arch_crash_
   handle_hotplug_event() to hp_action and cpu.
 - Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
 - Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
   to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
   by David Hildebrand. Folded this patch into the x86
   kexec_file_load support patch.

v7: 13apr2022
 https://lkml.org/lkml/2022/4/13/850
 https://lore.kernel.org/lkml/20220413164237.20845-1-eric.devol...@oracle.com/
 - Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
 https://lkml.org/lkml/2022/4/1/1203
 https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devol...@oracle.com/
 - Reword commit messages and some comment cleanup per Baoquan.
 - Changed elf_index to elfcorehdr_index for clarity.
 - Minor code changes per Baoquan.

v5: 3mar2022
 https://lkml.org/lkml/2022/3/3/674
 https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/
 - Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
   David Hildenbrand.
 - Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
 https://lkml.org/lkml/2022/2/9/1406
 https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/
 - Refactored patches per Baoquan suggestsions.
 - A few corrections, per Baoquan.

v3: 10jan2022
 https://lkml.org/lkml/2022/1/10/1212
 https://lore.kernel.org/lkml/20220110195727.1682-1-eric.devol...@oracle.com/
 - Rebasing per Baoquan He request.
 - Changed memory notifier per David Hildenbrand.
 - Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
 https://lkml.org/lkml/2021/12/7/1088
 https://lore.kernel.org/lkml/20211207195204.1582-1-eric.devol...@oracle.com/
 - Acting upon Baoquan He suggestion of removing elfcorehdr from
   the purgatory list of segments, removed purgatory code from
   patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
 https://lkml.org/lkml/2021/11/18/845
 https://lore.kernel.org/lkml/2028174948.37435-1-eric.devol...@oracle.com/
 - working patchset demonstrating kernel handling of hotplug
   updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
 https://lkml.org/lkml/2020/12/14/532
 https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d90...@oracle.com/
 - proposed concept of allowing kernel to handle hotplug update
   of elfcorehdr
---

Eric DeVolder (10):
  drivers/base: refactor cpu.c to use .is_visible()
  drivers/base: refactor memory.c to use .is_visible()
  crash: move a few code bits to setup support of crash hotplug
  crash: add generic infrastructure for crash hotplug support
  kexec: exclude elfcorehdr from the segment digest
  crash: memory and CPU hotplug sysfs attributes
  x86/crash: add x86 crash hotplug support
  crash: hotplug support for kexec_load()
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  x86/crash: optimize CPU changes

 .../ABI/testing/sysfs-devices-memory  |   8 +
 .../ABI/testing/sysfs-devices-system-cpu  |   8 +
 .../admin-guide/mm/memory-hotplug.rst |   8 +
 Documentation/core-api/cpu_hotplug.rst|  18 +
 arch/x86/Kconfig  |   3 +
 arch/x86/include/asm/kexec.h  |  18 +
 arch/x86/kernel/crash.c   | 140 ++-
 drivers/base/cpu.c|  83 +++-
 drivers/base/memory.c |  91 -
 include/linux/crash_core.h|   9 +
 include/linux/kexec.h |  63 +++-
 include/uapi/linux/kexec.h|   1 +
 kernel/Kconfig.kexec  |  35 ++
 kernel/crash_core.c   | 355 ++
 kernel/kexec.c|   5 +
 kernel/kexec_core.c   |   6 +
 kernel/kexec_file.c   | 187 +
 kernel/ksysfs.c   |  15 +
 18 files changed, 819 insertions(+), 234 deletions(-)

-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v24 01/10] drivers/base: refactor cpu.c to use .is_visible()

2023-06-28 Thread Eric DeVolder
Greg Kroah-Hartman requested that this file use the .is_visible()
method instead of #ifdefs for the attributes in cpu.c.

 static struct attribute *cpu_root_attrs[] = {
 #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
_attr_probe.attr,
_attr_release.attr,
 #endif
_attrs[0].attr.attr,
_attrs[1].attr.attr,
_attrs[2].attr.attr,
_attr_kernel_max.attr,
_attr_offline.attr,
_attr_isolated.attr,
 #ifdef CONFIG_NO_HZ_FULL
_attr_nohz_full.attr,
 #endif
 #ifdef CONFIG_GENERIC_CPU_AUTOPROBE
_attr_modalias.attr,
 #endif
NULL
 };

To that end:
 - the .is_visible() method is implemented, and IS_ENABLED(), rather
   than #ifdef, is used to determine the visibility of the attribute.
 - the DEVICE_ATTR() attributes are moved outside of #ifdefs, so that
   those structs are always present for the cpu_root_attrs[].
 - the #ifdefs guarding the attributes in the cpu_root_attrs[] are moved
   to the corresponding callback function; as the callback function must
   exist now that the attribute is always compiled-in (though not
   necessarily visible).

No functionality change intended.

Signed-off-by: Eric DeVolder 
---
 drivers/base/cpu.c | 67 --
 1 file changed, 53 insertions(+), 14 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index c1815b9dae68..75fa46a567a1 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -82,13 +82,14 @@ void unregister_cpu(struct cpu *cpu)
per_cpu(cpu_sys_devices, logical_cpu) = NULL;
return;
 }
+#endif /* CONFIG_HOTPLUG_CPU */
 
-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 static ssize_t cpu_probe_store(struct device *dev,
   struct device_attribute *attr,
   const char *buf,
   size_t count)
 {
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
ssize_t cnt;
int ret;
 
@@ -100,6 +101,9 @@ static ssize_t cpu_probe_store(struct device *dev,
 
unlock_device_hotplug();
return cnt;
+#else
+   return 0;
+#endif
 }
 
 static ssize_t cpu_release_store(struct device *dev,
@@ -107,6 +111,7 @@ static ssize_t cpu_release_store(struct device *dev,
 const char *buf,
 size_t count)
 {
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
ssize_t cnt;
int ret;
 
@@ -118,12 +123,13 @@ static ssize_t cpu_release_store(struct device *dev,
 
unlock_device_hotplug();
return cnt;
+#else
+   return 0;
+#endif
 }
 
 static DEVICE_ATTR(probe, S_IWUSR, NULL, cpu_probe_store);
 static DEVICE_ATTR(release, S_IWUSR, NULL, cpu_release_store);
-#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
-#endif /* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_KEXEC
 #include 
@@ -273,14 +279,16 @@ static ssize_t print_cpus_isolated(struct device *dev,
 }
 static DEVICE_ATTR(isolated, 0444, print_cpus_isolated, NULL);
 
-#ifdef CONFIG_NO_HZ_FULL
 static ssize_t print_cpus_nohz_full(struct device *dev,
struct device_attribute *attr, char *buf)
 {
+#ifdef CONFIG_NO_HZ_FULL
return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(tick_nohz_full_mask));
+#else
+   return 0;
+#endif
 }
 static DEVICE_ATTR(nohz_full, 0444, print_cpus_nohz_full, NULL);
-#endif
 
 static void cpu_device_release(struct device *dev)
 {
@@ -301,12 +309,12 @@ static void cpu_device_release(struct device *dev)
 */
 }
 
-#ifdef CONFIG_GENERIC_CPU_AUTOPROBE
 static ssize_t print_cpu_modalias(struct device *dev,
  struct device_attribute *attr,
  char *buf)
 {
int len = 0;
+#ifdef CONFIG_GENERIC_CPU_AUTOPROBE
u32 i;
 
len += sysfs_emit_at(buf, len,
@@ -322,9 +330,11 @@ static ssize_t print_cpu_modalias(struct device *dev,
len += sysfs_emit_at(buf, len, ",%04X", i);
}
len += sysfs_emit_at(buf, len, "\n");
+#endif
return len;
 }
 
+#ifdef CONFIG_GENERIC_CPU_AUTOPROBE
 static int cpu_uevent(const struct device *dev, struct kobj_uevent_env *env)
 {
char *buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
@@ -451,32 +461,61 @@ struct device *cpu_device_create(struct device *parent, 
void *drvdata,
 }
 EXPORT_SYMBOL_GPL(cpu_device_create);
 
-#ifdef CONFIG_GENERIC_CPU_AUTOPROBE
 static DEVICE_ATTR(modalias, 0444, print_cpu_modalias, NULL);
-#endif
 
 static struct attribute *cpu_root_attrs[] = {
-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
_attr_probe.attr,
_attr_release.attr,
-#endif
_attrs[0].attr.attr,
_attrs[1].attr.attr,
_attrs[2].attr.attr,
_attr_kernel_max.attr,
_attr_offline.attr,
_attr_isolated.attr,
-#ifdef CONFIG_NO_HZ_FULL
_attr_nohz_full.attr,
-#endif
-#ifdef CONFIG_GENERIC_CPU_AUTOPROBE
_attr_modalias.attr,
-#endif
NULL
 };
 
+static umode_t
+cpu_root_attr_i

[PATCH v24 09/10] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()

2023-06-28 Thread Eric DeVolder
The function crash_prepare_elf64_headers() generates the elfcorehdr
which describes the CPUs and memory in the system for the crash kernel.
In particular, it writes out ELF PT_NOTEs for memory regions and the
CPUs in the system.

With respect to the CPUs, the current implementation utilizes
for_each_present_cpu() which means that as CPUs are added and removed,
the elfcorehdr must again be updated to reflect the new set of CPUs.

The reasoning behind the move to use for_each_possible_cpu(), is:

- At kernel boot time, all percpu crash_notes are allocated for all
  possible CPUs; that is, crash_notes are not allocated dynamically
  when CPUs are plugged/unplugged. Thus the crash_notes for each
  possible CPU are always available.

- The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU.
  Changing to for_each_possible_cpu() is valid as the crash_notes
  pointed to by each CPU PT_NOTE are present and always valid.

Furthermore, examining a common crash processing path of:

 kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
   elfcorehdr  /proc/vmcore vmcore

reveals how the ELF CPU PT_NOTEs are utilized:

- Upon panic, each CPU is sent an IPI and shuts itself down, recording
 its state in its crash_notes. When all CPUs are shutdown, the
 crash kernel is launched with a pointer to the elfcorehdr.

- The crash kernel via linux/fs/proc/vmcore.c does not examine or
 use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.

- The makedumpfile utility uses /proc/vmcore and reads the CPU
 PT_NOTEs to craft a nr_cpus variable, which is reported in a
 header but otherwise generally unused. Makedumpfile creates the
 vmcore.

- The 'crash' dump analyzer does not appear to reference the CPU
 PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
 symbols and directly examines those structure contents from vmcore
 memory. From that information it is able to determine which CPUs
 are present and online, and locate the corresponding crash_notes.
 Said differently, it appears that 'crash' analyzer does not rely
 on the ELF PT_NOTEs for CPUs; rather it obtains the information
 directly via kernel symbols and the memory within the vmcore.

(There maybe other vmcore generating and analysis tools that do use
these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
common solution.)

This results in the benefit of having all CPUs described in the
elfcorehdr, and therefore reducing the need to re-generate the
elfcorehdr on CPU changes, at the small expense of an additional
56 bytes per PT_NOTE for not-present-but-possible CPUs.

On systems where kexec_file_load() syscall is utilized, all the above
is valid. On systems where kexec_load() syscall is utilized, there
may be the need for the elfcorehdr to be regenerated once. The reason
being that some archs only populate the 'present' CPUs from the
/sys/devices/system/cpus entries, which the userspace 'kexec' utility
uses to generate the userspace-supplied elfcorehdr. In this situation,
one memory or CPU change will rewrite the elfcorehdr via the
crash_prepare_elf64_headers() function and now all possible CPUs will
be described, just as with kexec_file_load() syscall.

Suggested-by: Sourabh Jain 
Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/crash_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index fa918176d46d..7378b501fada 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
ehdr->e_ehsize = sizeof(Elf64_Ehdr);
ehdr->e_phentsize = sizeof(Elf64_Phdr);
 
-   /* Prepare one phdr of type PT_NOTE for each present CPU */
-   for_each_present_cpu(cpu) {
+   /* Prepare one phdr of type PT_NOTE for each possible CPU */
+   for_each_possible_cpu(cpu) {
phdr->p_type = PT_NOTE;
notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
phdr->p_offset = phdr->p_paddr = notes_addr;
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v24 07/10] x86/crash: add x86 crash hotplug support

2023-06-28 Thread Eric DeVolder
When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

A new elfcorehdr is generated from the available CPUs and memory
and replaces the existing elfcorehdr. The segment containing the
elfcorehdr is identified at run-time in
crash_core:crash_handle_hotplug_event().

No modifications to purgatory (see 'kexec: exclude elfcorehdr
from the segment digest') or boot_params (as the elfcorehdr=
capture kernel command line parameter pointer remains unchanged
and correct) are needed, just elfcorehdr.

For kexec_file_load(), the elfcorehdr segment size is based on
NR_CPUS and CRASH_MAX_MEMORY_RANGES in order to accommodate a
growing number of CPU and memory resources.

For kexec_load(), the userspace kexec utility needs to size the
elfcorehdr segment in the same/similar manner.

To accommodate kexec_load() syscall in the absence of
kexec_file_load() syscall support, prepare_elf_headers() and
dependents are moved outside of CONFIG_KEXEC_FILE.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/Kconfig |   3 +
 arch/x86/include/asm/kexec.h |  15 +
 arch/x86/kernel/crash.c  | 103 ---
 3 files changed, 114 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 06a4472d0fc0..42c083da7ce4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2058,6 +2058,9 @@ config ARCH_SUPPORTS_KEXEC_JUMP
 config ARCH_SUPPORTS_CRASH_DUMP
def_bool X86_64 || (X86_32 && HIGHMEM)
 
+config ARCH_SUPPORTS_CRASH_HOTPLUG
+   def_bool y
+
 config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || 
CRASH_DUMP)
default "0x100"
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5b77bbc28f96..9143100ea3ea 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -209,6 +209,21 @@ typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..c70a111c44fa 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -158,8 +158,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
crash_save_cpu(regs, safe_smp_processor_id());
 }
 
-#ifdef CONFIG_KEXEC_FILE
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -231,7 +229,7 @@ static int prepare_elf64_ram_headers_callback(struct 
resource *res, void *arg)
 
 /* Prepare elf headers. Return addr and size */
 static int prepare_elf_headers(struct kimage *image, void **addr,
-   unsigned long *sz)
+   unsigned long *sz, unsigned long 
*nr_mem_ranges)
 {
struct crash_mem *cmem;
int ret;
@@ -249,6 +247,9 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
if (ret)
goto out;
 
+   /* Return the computed number of memory ranges, for hotplug usage */
+   *nr_mem_ranges = cmem->nr_ranges;
+
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), 
addr, sz);
 
@@ -257,6 +258,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
return ret;
 }
 
+#ifdef CONFIG_KEXEC_FILE
 static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
 {
unsigned int nr_e820_entries;
@@ -371,18 +373,42 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
 int crash_load_segments(struct kimage *image)
 {
int ret;
+   unsigned long pnum = 0;
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
  .buf_max = ULONG_MAX, .top_down = false };
 
/* Prepare elf headers and add a segment */
-   ret = prepare_elf_headers(image, , );
+   ret = prepare_elf_headers(image, , , );
if (ret)
return ret;
 
-   image->elf_headers = kbuf.buffer;
-   image->elf_headers_sz = kbuf.bufsz;
+   image->elf_headers  = kbuf.buffer;
+   image->elf_he

[PATCH v24 10/10] x86/crash: optimize CPU changes

2023-06-28 Thread Eric DeVolder
crash_prepare_elf64_headers() writes into the elfcorehdr an ELF
PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs
(ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr.

The kimage->file_mode term covers kdump images loaded via the
kexec_file_load() syscall. Since crash_prepare_elf64_headers()
wrote the initial elfcorehdr, no update to the elfcorehdr is
needed for CPU changes.

The kimage->elfcorehdr_updated term covers kdump images loaded via
the kexec_load() syscall. At least one memory or CPU change must occur
to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr.
Afterwards, no update to the elfcorehdr is needed for CPU changes.

This code is intentionally *NOT* hoisted into
crash_handle_hotplug_event() as it would prevent the arch-specific
handler from running for CPU changes. This would break PPC, for
example, which needs to update other information besides the
elfcorehdr, on CPU changes.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/kernel/crash.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index caf22bcb61af..18d2a18d1073 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -467,6 +467,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
unsigned long mem, memsz;
unsigned long elfsz = 0;
 
+   /*
+* As crash_prepare_elf64_headers() has already described all
+* possible CPUs, there is no need to update the elfcorehdr
+* for additional CPU changes.
+*/
+   if ((image->file_mode || image->elfcorehdr_updated) &&
+   ((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+   (image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+   return;
+
/*
 * Create the new elfcorehdr reflecting the changes to CPU and/or
 * memory resources.
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v24 08/10] crash: hotplug support for kexec_load()

2023-06-28 Thread Eric DeVolder
The hotplug support for kexec_load() requires changes to the
userspace kexec-tools and a little extra help from the kernel.

Given a kdump capture kernel loaded via kexec_load(), and a
subsequent hotplug event, the crash hotplug handler finds the
elfcorehdr and rewrites it to reflect the hotplug change.
That is the desired outcome, however, at kernel panic time,
the purgatory integrity check fails (because the elfcorehdr
changed), and the capture kernel does not boot and no vmcore
is generated.

Therefore, the userspace kexec-tools/kexec must indicate to the
kernel that the elfcorehdr can be modified (because the kexec
excluded the elfcorehdr from the digest, and sized the elfcorehdr
memory buffer appropriately).

To facilitate hotplug support with kexec_load():
 - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is
   safe for the kernel to modify the kexec_load()'d elfcorehdr
 - the /sys/kernel/crash_elfcorehdr_size node communicates the
   preferred size of the elfcorehdr memory buffer
 - The sysfs crash_hotplug nodes (ie.
   /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically
   take into account kexec_file_load() vs kexec_load() and
   KEXEC_UPDATE_ELFCOREHDR.
   This is critical so that the udev rule processing of crash_hotplug
   is all that is needed to determine if the userspace unload-then-load
   of the kdump image is to be skipped, or not. The proposed udev
   rule change looks like:
   # The kernel updates the crash elfcorehdr for CPU and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

The table below indicates the behavior of kexec_load()'d kdump image
updates (with the new udev crash_hotplug rule in place):

 Kernel |Kexec
 ---+-+
 Old|Old  |New
|  a  | a
 ---+-+
 New|  a  | b
 ---+-+

where kexec 'old' and 'new' delineate kexec-tools has the needed
modifications for the crash hotplug feature, and kernel 'old' and
'new' delineate the kernel supports this crash hotplug feature.

Behavior 'a' indicates the unload-then-reload of the entire kdump
image. For the kexec 'old' column, the unload-then-reload occurs
due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel
(with 'new' kexec) does not present the crash_hotplug sysfs node,
which leads to the unload-then-reload of the kdump image.

Behavior 'b' indicates the desired optimized behavior of the kernel
directly modifying the elfcorehdr and avoiding the unload-then-reload
of the kdump image.

If the udev rule is not updated with crash_hotplug node check, then
no matter any combination of kernel or kexec is new or old, the
kdump image continues to be unload-then-reload on hotplug changes.

To fully support crash hotplug feature, there needs to be a rollout
of kernel, kexec-tools and udev rule changes. However, the order of
the rollout of these pieces does not matter; kexec_load()'d kdump
images still function for hotplug as-is.

Suggested-by: Hari Bathini 
Signed-off-by: Eric DeVolder 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/include/asm/kexec.h | 11 +++
 arch/x86/kernel/crash.c  | 27 +++
 include/linux/kexec.h| 14 --
 include/uapi/linux/kexec.h   |  1 +
 kernel/Kconfig.kexec |  4 
 kernel/crash_core.c  | 31 +++
 kernel/kexec.c   |  5 +
 kernel/ksysfs.c  | 15 +++
 8 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage 
*image);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(void);
+#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static inline int crash_hotplug_memory_support(void) { return 1; }
-#define crash_hotplug_memory_support crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(void);
+#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
 #endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c70a111c44fa..caf22bcb61af 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -427,6 +427,33 @@ int crash_load_segments(struct kimage *image)
 #undef pr_fmt
 #define pr_fmt(fmt) "cras

[PATCH v24 06/10] crash: memory and CPU hotplug sysfs attributes

2023-06-28 Thread Eric DeVolder
Introduce the crash_hotplug attribute for memory and CPUs for
use by userspace.  These attributes directly facilitate the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/memory/memory81
  looking at device '/devices/system/memory/memory81':
KERNEL=="memory81"
SUBSYSTEM=="memory"
DRIVER==""
ATTR{online}=="1"
ATTR{phys_device}=="0"
ATTR{phys_index}=="0051"
ATTR{removable}=="1"
ATTR{state}=="online"
ATTR{valid_zones}=="Movable"

  looking at parent device '/devices/system/memory':
KERNELS=="memory"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{auto_online_blocks}=="offline"
ATTRS{block_size_bytes}=="800"
ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
  looking at device '/devices/system/cpu/cpu0':
KERNEL=="cpu0"
SUBSYSTEM=="cpu"
DRIVER=="processor"
ATTR{crash_notes}=="277c38600"
ATTR{crash_notes_size}=="368"
ATTR{online}=="1"

  looking at parent device '/devices/system/cpu':
KERNELS=="cpu"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{crash_hotplug}=="1"
ATTRS{isolated}==""
ATTRS{kernel_max}=="8191"
ATTRS{nohz_full}=="  (null)"
ATTRS{offline}=="4-7"
ATTRS{online}=="0-3"
ATTRS{possible}=="0-7"
ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

 # The kernel updates the crash elfcorehdr for CPU and memory changes
 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
 SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules
test if crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule skips
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 Documentation/ABI/testing/sysfs-devices-memory |  8 
 .../ABI/testing/sysfs-devices-system-cpu   |  8 
 .../admin-guide/mm/memory-hotplug.rst  |  8 
 Documentation/core-api/cpu_hotplug.rst | 18 ++
 drivers/base/cpu.c | 16 ++--
 drivers/base/memory.c  | 13 +
 include/linux/kexec.h  |  8 
 7 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-memory 
b/Documentation/ABI/testing/sysfs-devices-memory
index d8b0f80b9e33..c50725ebebb7 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -110,3 +110,11 @@ Description:
link is created for memory section 9 on node0.
 
/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
+
+What:  /sys/devices/system/cpu/crash_hotplug
+Date:  Jun 2023
+Contact:   Linux kernel mailing list 
+Description:
+   (RO) indicates whether or not the kernel directly supports
+   modifying the crash elfcorehdr for memory hot un/plug and/or
+   on/offline changes.
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index ecd585ca2d50..598b0fa67481 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -686,3 +686,11 @@ Description:
(RO) the list of C

[PATCH v24 04/10] crash: add generic infrastructure for crash hotplug support

2023-06-28 Thread Eric DeVolder
To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining). The crash elfcorehdr describes the CPUs and memory to
be written into the vmcore.

To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE
group, just prior to the STARTING group, which is very close to the
CPU starting up in a plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate. Note that for a CPU being unplugged
or offlined, the CPU will still be present in the list of CPUs
generated by crash_prepare_elf64_headers(). However, there is no
need to explicitly omit the CPU, see justification in
'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'.

To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/crash_core.h |   9 +++
 include/linux/kexec.h  |  11 +++
 kernel/Kconfig.kexec   |  31 
 kernel/crash_core.c| 142 +
 kernel/kexec_core.c|   6 ++
 5 files changed, 199 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..e14345cc7a22 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long 
system_ram,
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
 
+#define KEXEC_CRASH_HP_NONE0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_CPU  2
+#define KEXEC_CRASH_HP_ADD_MEMORY  3
+#define KEXEC_CRASH_HP_REMOVE_MEMORY   4
+#define KEXEC_CRASH_HP_INVALID_CPU -1U
+
+struct kimage;
+
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 811a90e09698..b9903dd48e24 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Verify architecture specific macros are defined */
@@ -360,6 +361,12 @@ struct kimage {
struct purgatory_info purgatory_info;
 #endif
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   int hp_action;
+   int elfcorehdr_index;
+   bool elfcorehdr_updated;
+#endif
+
 #ifdef CONFIG_IMA_KEXEC
/* Virtual address of IMA measurement buffer for kexec syscall */
void *ima_buffer;
@@ -490,6 +497,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, 
unsigned int pages, g
 static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) 
{ }
 #endif
 
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index 5d576ddfd999..7eb42a795176 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -107,4 +107,35 @@ config CRASH_DUMP
  For s390, this option also enables zfcpdump.
  See also 
 
+config CRASH_HOTPLUG
+   bool "Update the crash elfcorehdr on system configuration changes"
+   default y
+   depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+   depends on ARCH_SUPPORTS_CRASH_HOTPLUG
+   help
+ Enable direct update to the crash elfcorehdr (which contains
+ the list of CPUs and memory regions to be dumped upon a crash)
+ in response to hot plug/unplug or online/offline of CPUs or
+ memory. This is a much more advanced approach than userspace
+ attempting that.
+
+ If unsure, say Y.
+
+config CRASH_MAX_MEMORY_RANGES
+   int "Specify the maximum number of memory regions for the elfcorehdr"
+   default 8192
+   depends on CRASH_HOTPLUG
+   help
+

[PATCH v24 03/10] crash: move a few code bits to setup support of crash hotplug

2023-06-28 Thread Eric DeVolder
The crash hotplug support leans on the work for the kexec_file_load()
syscall. To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

No functionality change intended.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/kexec.h |  30 +++
 kernel/crash_core.c   | 182 ++
 kernel/kexec_file.c   | 181 -
 3 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 22b5cd24f581..811a90e09698 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -105,6 +105,21 @@ struct compat_kexec_segment {
 };
 #endif
 
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN   4096
+
+struct crash_mem {
+   unsigned int max_nr_ranges;
+   unsigned int nr_ranges;
+   struct range ranges[];
+};
+
+extern int crash_exclude_mem_range(struct crash_mem *mem,
+  unsigned long long mstart,
+  unsigned long long mend);
+extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
+  void **addr, unsigned long *sz);
+
 #ifdef CONFIG_KEXEC_FILE
 struct purgatory_info {
/*
@@ -230,21 +245,6 @@ static inline int arch_kexec_locate_mem_hole(struct 
kexec_buf *kbuf)
 }
 #endif
 
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN   4096
-
-struct crash_mem {
-   unsigned int max_nr_ranges;
-   unsigned int nr_ranges;
-   struct range ranges[];
-};
-
-extern int crash_exclude_mem_range(struct crash_mem *mem,
-  unsigned long long mstart,
-  unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
-  void **addr, unsigned long *sz);
-
 #ifndef arch_kexec_apply_relocations_add
 /*
  * arch_kexec_apply_relocations_add - apply relocations of type RELA
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..b7c30b748a16 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -314,6 +315,187 @@ static int __init parse_crashkernel_dummy(char *arg)
 }
 early_param("crashkernel", parse_crashkernel_dummy);
 
+int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+ void **addr, unsigned long *sz)
+{
+   Elf64_Ehdr *ehdr;
+   Elf64_Phdr *phdr;
+   unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+   unsigned char *buf;
+   unsigned int cpu, i;
+   unsigned long long notes_addr;
+   unsigned long mstart, mend;
+
+   /* extra phdr for vmcoreinfo ELF note */
+   nr_phdr = nr_cpus + 1;
+   nr_phdr += mem->nr_ranges;
+
+   /*
+* kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+* area (for example, 8000 - a000 on x86_64).
+* I think this is required by tools like gdb. So same physical
+* memory will be mapped in two ELF headers. One will contain kernel
+* text virtual addresses and other will have __va(physical) addresses.
+*/
+
+   nr_phdr++;
+   elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+   elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+   buf = vzalloc(elf_sz);
+   if (!buf)
+   return -ENOMEM;
+
+   ehdr = (Elf64_Ehdr *)buf;
+   phdr = (Elf64_Phdr *)(ehdr + 1);
+   memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+   ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+   ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
+   ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+   ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+   memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
+   ehdr->e_type = ET_CORE;
+   ehdr->e_machine = ELF_ARCH;
+   ehdr->e_version = EV_CURRENT;
+   ehdr->e_phoff = sizeof(Elf64_Ehdr);
+   ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+   ehdr->e_phentsize = sizeof(Elf64_Phdr);
+
+   /* Prepare one phdr of type PT_NOTE for each present CPU */
+   for_each_present_cpu(cpu) {
+   phdr->p_type = PT_NOTE;
+   notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
+   phdr->p_offset = phdr->p_paddr = notes_addr;
+   phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
+   (ehdr->e_phnum)++;
+   phdr++;
+   }
+
+   /* Prepare one PT_NOTE header for vmcoreinfo */
+   phdr->p_type = PT_NOTE;
+   

[PATCH v24 05/10] kexec: exclude elfcorehdr from the segment digest

2023-06-28 Thread Eric DeVolder
When a crash kernel is loaded via the kexec_file_load() syscall, the
kernel places the various segments (ie crash kernel, crash initrd,
boot_params, elfcorehdr, purgatory, etc) in memory. For those
architectures that utilize purgatory, a hash digest of the segments
is calculated for integrity checking. The digest is embedded into
the purgatory image prior to placing in memory.

Updates to the elfcorehdr in response to CPU and memory changes
would cause the purgatory integrity checking to fail (at crash time,
and no vmcore created). Therefore, the elfcorehdr segment is
explicitly excluded from the purgatory digest, enabling updates to
the elfcorehdr while also avoiding the need to recompute the hash
digest and reload purgatory.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/kexec_file.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index e9cf9e8d8f01..824ffc5282f4 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -726,6 +726,12 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
for (j = i = 0; i < image->nr_segments; i++) {
struct kexec_segment *ksegment;
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   /* Exclude elfcorehdr segment to allow future changes via 
hotplug */
+   if (j == image->elfcorehdr_index)
+   continue;
+#endif
+
ksegment = >segment[i];
/*
 * Skip purgatory as it will be modified once we put digest
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v24 02/10] drivers/base: refactor memory.c to use .is_visible()

2023-06-28 Thread Eric DeVolder
Greg Kroah-Hartman requested that this file use the .is_visible()
method instead of #ifdefs for the attributes in memory.c.

 static struct attribute *memory_memblk_attrs[] = {
 _attr_phys_index.attr,
 _attr_state.attr,
 _attr_phys_device.attr,
 _attr_removable.attr,
 #ifdef CONFIG_MEMORY_HOTREMOVE
 _attr_valid_zones.attr,
 #endif
 NULL
 };

and

 static struct attribute *memory_root_attrs[] = {
 #ifdef CONFIG_ARCH_MEMORY_PROBE
 _attr_probe.attr,
 #endif

 #ifdef CONFIG_MEMORY_FAILURE
 _attr_soft_offline_page.attr,
 _attr_hard_offline_page.attr,
 #endif

 _attr_block_size_bytes.attr,
 _attr_auto_online_blocks.attr,
 NULL
 };

To that end:
 - the .is_visible() method is implemented, and IS_ENABLED(), rather
   than #ifdef, is used to determine the visibility of the attribute.
 - the DEVICE_ATTR_xx() attributes are moved outside of #ifdefs, so that
   those structs are always present for the memory_memblk_attrs[] and
   memory_root_attrs[].
 - the #ifdefs guarding the attributes in the memory_memblk_attrs[] and
   memory_root_attrs[] are moved to the corresponding callback function;
   as the callback function must exist now that the attribute is always
   compiled-in (though not necessarily visible).

No functionality change intended.

Signed-off-by: Eric DeVolder 
---
 drivers/base/memory.c | 78 +++
 1 file changed, 65 insertions(+), 13 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..f03eda7e1c9c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -405,10 +405,12 @@ static int print_allowed_zone(char *buf, int len, int nid,
 
return sysfs_emit_at(buf, len, " %s", zone->name);
 }
+#endif
 
 static ssize_t valid_zones_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
+#ifdef CONFIG_MEMORY_HOTREMOVE
struct memory_block *mem = to_memory_block(dev);
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -444,9 +446,11 @@ static ssize_t valid_zones_show(struct device *dev,
 out:
len += sysfs_emit_at(buf, len, "\n");
return len;
+#else
+   return 0;
+#endif
 }
 static DEVICE_ATTR_RO(valid_zones);
-#endif
 
 static DEVICE_ATTR_RO(phys_index);
 static DEVICE_ATTR_RW(state);
@@ -496,10 +500,10 @@ static DEVICE_ATTR_RW(auto_online_blocks);
  * as well as ppc64 will do all of their discovery in userspace
  * and will require this interface.
  */
-#ifdef CONFIG_ARCH_MEMORY_PROBE
 static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
   const char *buf, size_t count)
 {
+#ifdef CONFIG_ARCH_MEMORY_PROBE
u64 phys_addr;
int nid, ret;
unsigned long pages_per_block = PAGES_PER_SECTION * sections_per_block;
@@ -527,12 +531,13 @@ static ssize_t probe_store(struct device *dev, struct 
device_attribute *attr,
 out:
unlock_device_hotplug();
return ret;
+#else
+   return 0;
+#endif
 }
 
 static DEVICE_ATTR_WO(probe);
-#endif
 
-#ifdef CONFIG_MEMORY_FAILURE
 /*
  * Support for offlining pages of memory
  */
@@ -542,6 +547,7 @@ static ssize_t soft_offline_page_store(struct device *dev,
   struct device_attribute *attr,
   const char *buf, size_t count)
 {
+#ifdef CONFIG_MEMORY_FAILURE
int ret;
u64 pfn;
if (!capable(CAP_SYS_ADMIN))
@@ -551,6 +557,9 @@ static ssize_t soft_offline_page_store(struct device *dev,
pfn >>= PAGE_SHIFT;
ret = soft_offline_page(pfn, 0);
return ret == 0 ? count : ret;
+#else
+   return 0;
+#endif
 }
 
 /* Forcibly offline a page, including killing processes. */
@@ -558,6 +567,7 @@ static ssize_t hard_offline_page_store(struct device *dev,
   struct device_attribute *attr,
   const char *buf, size_t count)
 {
+#ifdef CONFIG_MEMORY_FAILURE
int ret;
u64 pfn;
if (!capable(CAP_SYS_ADMIN))
@@ -569,11 +579,13 @@ static ssize_t hard_offline_page_store(struct device *dev,
if (ret == -EOPNOTSUPP)
ret = 0;
return ret ? ret : count;
+#else
+   return 0;
+#endif
 }
 
 static DEVICE_ATTR_WO(soft_offline_page);
 static DEVICE_ATTR_WO(hard_offline_page);
-#endif
 
 /* See phys_device_show(). */
 int __weak arch_get_memory_phys_device(unsigned long start_pfn)
@@ -611,14 +623,35 @@ static struct attribute *memory_memblk_attrs[] = {
_attr_state.attr,
_attr_phys_device.attr,
_attr_removable.attr,
-#ifdef CONFIG_MEMORY_HOTREMOVE
_attr_valid_zones.attr,
-#endif
NULL
 };
 
+static umode_t
+memory_memblk_attr_is_visible(struct kobject *kobj,
+  struct attribute *attr

Re: [PATCH v23 4/8] crash: memory and CPU hotplug sysfs attributes

2023-06-16 Thread Eric DeVolder




On 6/13/23 03:03, Greg KH wrote:

On Mon, Jun 12, 2023 at 05:07:08PM -0400, Eric DeVolder wrote:

Introduce the crash_hotplug attribute for memory and CPUs for
use by userspace.  These attributes directly facilitate the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory. For example:

  # udevadm info --attribute-walk /sys/devices/system/memory/memory81
   looking at device '/devices/system/memory/memory81':
 KERNEL=="memory81"
 SUBSYSTEM=="memory"
 DRIVER==""
 ATTR{online}=="1"
 ATTR{phys_device}=="0"
 ATTR{phys_index}=="0051"
 ATTR{removable}=="1"
 ATTR{state}=="online"
 ATTR{valid_zones}=="Movable"

   looking at parent device '/devices/system/memory':
 KERNELS=="memory"
 SUBSYSTEMS==""
 DRIVERS==""
 ATTRS{auto_online_blocks}=="offline"
 ATTRS{block_size_bytes}=="800"
 ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

  # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
   looking at device '/devices/system/cpu/cpu0':
 KERNEL=="cpu0"
 SUBSYSTEM=="cpu"
 DRIVER=="processor"
 ATTR{crash_notes}=="277c38600"
 ATTR{crash_notes_size}=="368"
 ATTR{online}=="1"

   looking at parent device '/devices/system/cpu':
 KERNELS=="cpu"
 SUBSYSTEMS==""
 DRIVERS==""
 ATTRS{crash_hotplug}=="1"
 ATTRS{isolated}==""
 ATTRS{kernel_max}=="8191"
 ATTRS{nohz_full}=="  (null)"
 ATTRS{offline}=="4-7"
 ATTRS{online}=="0-3"
 ATTRS{possible}=="0-7"
 ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

  # The kernel updates the crash elfcorehdr for CPU and memory changes
  SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
  SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules
test if crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule skips
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
  .../admin-guide/mm/memory-hotplug.rst  |  8 
  Documentation/core-api/cpu_hotplug.rst | 18 ++
  drivers/base/cpu.c | 14 ++
  drivers/base/memory.c  | 13 +
  include/linux/kexec.h  |  8 
  5 files changed, 61 insertions(+)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst 
b/Documentation/admin-guide/mm/memory-hotplug.rst
index 1b02fe5807cc..eb99d79223a3 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -291,6 +291,14 @@ The following files are currently defined:
   Availability depends on the CONFIG_ARCH_MEMORY_PROBE
   kernel configuration option.
  ``uevent``   read-write: generic udev file for device subsystems.
+``crash_hotplug``  read-only: when changes to the system memory map
+  occur due to hot un/plug of memory, this file contains
+  '1' if the kernel updates the kdump capture kernel memory
+  map itself (via elfcorehdr), or '0' if userspace must 
update
+  the kdump capture kernel memory map.
+
+  Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
+  configuration option.
  == 

Re: [PATCH v23 4/8] crash: memory and CPU hotplug sysfs attributes

2023-06-13 Thread Eric DeVolder



On 6/13/23 10:24, Eric DeVolder wrote:



On 6/13/23 03:03, Greg KH wrote:

On Mon, Jun 12, 2023 at 05:07:08PM -0400, Eric DeVolder wrote:

Introduce the crash_hotplug attribute for memory and CPUs for
use by userspace.  These attributes directly facilitate the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory. For example:

  # udevadm info --attribute-walk /sys/devices/system/memory/memory81
   looking at device '/devices/system/memory/memory81':
 KERNEL=="memory81"
 SUBSYSTEM=="memory"
 DRIVER==""
 ATTR{online}=="1"
 ATTR{phys_device}=="0"
 ATTR{phys_index}=="0051"
 ATTR{removable}=="1"
 ATTR{state}=="online"
 ATTR{valid_zones}=="Movable"

   looking at parent device '/devices/system/memory':
 KERNELS=="memory"
 SUBSYSTEMS==""
 DRIVERS==""
 ATTRS{auto_online_blocks}=="offline"
 ATTRS{block_size_bytes}=="800"
 ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

  # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
   looking at device '/devices/system/cpu/cpu0':
 KERNEL=="cpu0"
 SUBSYSTEM=="cpu"
 DRIVER=="processor"
 ATTR{crash_notes}=="277c38600"
 ATTR{crash_notes_size}=="368"
 ATTR{online}=="1"

   looking at parent device '/devices/system/cpu':
 KERNELS=="cpu"
 SUBSYSTEMS==""
 DRIVERS==""
 ATTRS{crash_hotplug}=="1"
 ATTRS{isolated}==""
 ATTRS{kernel_max}=="8191"
 ATTRS{nohz_full}=="  (null)"
 ATTRS{offline}=="4-7"
 ATTRS{online}=="0-3"
 ATTRS{possible}=="0-7"
 ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

  # The kernel updates the crash elfcorehdr for CPU and memory changes
  SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
  SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules
test if crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule skips
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
  .../admin-guide/mm/memory-hotplug.rst  |  8 
  Documentation/core-api/cpu_hotplug.rst | 18 ++
  drivers/base/cpu.c | 14 ++
  drivers/base/memory.c  | 13 +
  include/linux/kexec.h  |  8 
  5 files changed, 61 insertions(+)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst 
b/Documentation/admin-guide/mm/memory-hotplug.rst

index 1b02fe5807cc..eb99d79223a3 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -291,6 +291,14 @@ The following files are currently defined:
 Availability depends on the CONFIG_ARCH_MEMORY_PROBE
 kernel configuration option.
  ``uevent``   read-write: generic udev file for device subsystems.
+``crash_hotplug``  read-only: when changes to the system memory map
+   occur due to hot un/plug of memory, this file contains
+   '1' if the kernel updates the kdump capture kernel memory
+   map itself (via elfcorehdr), or '0' if userspace must update
+   the kdump capture kernel memory map.
+
+   Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
+   configuration option.
  == 
==

Re: [PATCH v23 4/8] crash: memory and CPU hotplug sysfs attributes

2023-06-13 Thread Eric DeVolder




On 6/13/23 03:03, Greg KH wrote:

On Mon, Jun 12, 2023 at 05:07:08PM -0400, Eric DeVolder wrote:

Introduce the crash_hotplug attribute for memory and CPUs for
use by userspace.  These attributes directly facilitate the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory. For example:

  # udevadm info --attribute-walk /sys/devices/system/memory/memory81
   looking at device '/devices/system/memory/memory81':
 KERNEL=="memory81"
 SUBSYSTEM=="memory"
 DRIVER==""
 ATTR{online}=="1"
 ATTR{phys_device}=="0"
 ATTR{phys_index}=="0051"
 ATTR{removable}=="1"
 ATTR{state}=="online"
 ATTR{valid_zones}=="Movable"

   looking at parent device '/devices/system/memory':
 KERNELS=="memory"
 SUBSYSTEMS==""
 DRIVERS==""
 ATTRS{auto_online_blocks}=="offline"
 ATTRS{block_size_bytes}=="800"
 ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

  # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
   looking at device '/devices/system/cpu/cpu0':
 KERNEL=="cpu0"
 SUBSYSTEM=="cpu"
 DRIVER=="processor"
 ATTR{crash_notes}=="277c38600"
 ATTR{crash_notes_size}=="368"
 ATTR{online}=="1"

   looking at parent device '/devices/system/cpu':
 KERNELS=="cpu"
 SUBSYSTEMS==""
 DRIVERS==""
 ATTRS{crash_hotplug}=="1"
 ATTRS{isolated}==""
 ATTRS{kernel_max}=="8191"
 ATTRS{nohz_full}=="  (null)"
 ATTRS{offline}=="4-7"
 ATTRS{online}=="0-3"
 ATTRS{possible}=="0-7"
 ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

  # The kernel updates the crash elfcorehdr for CPU and memory changes
  SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
  SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules
test if crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule skips
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
  .../admin-guide/mm/memory-hotplug.rst  |  8 
  Documentation/core-api/cpu_hotplug.rst | 18 ++
  drivers/base/cpu.c | 14 ++
  drivers/base/memory.c  | 13 +
  include/linux/kexec.h  |  8 
  5 files changed, 61 insertions(+)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst 
b/Documentation/admin-guide/mm/memory-hotplug.rst
index 1b02fe5807cc..eb99d79223a3 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -291,6 +291,14 @@ The following files are currently defined:
   Availability depends on the CONFIG_ARCH_MEMORY_PROBE
   kernel configuration option.
  ``uevent``   read-write: generic udev file for device subsystems.
+``crash_hotplug``  read-only: when changes to the system memory map
+  occur due to hot un/plug of memory, this file contains
+  '1' if the kernel updates the kdump capture kernel memory
+  map itself (via elfcorehdr), or '0' if userspace must 
update
+  the kdump capture kernel memory map.
+
+  Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
+  configuration option.
  == 

[PATCH v23 5/8] x86/crash: add x86 crash hotplug support

2023-06-12 Thread Eric DeVolder
When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

A new elfcorehdr is generated from the available CPUs and memory
and replaces the existing elfcorehdr. The segment containing the
elfcorehdr is identified at run-time in
crash_core:crash_handle_hotplug_event().

No modifications to purgatory (see 'kexec: exclude elfcorehdr
from the segment digest') or boot_params (as the elfcorehdr=
capture kernel command line parameter pointer remains unchanged
and correct) are needed, just elfcorehdr.

For kexec_file_load(), the elfcorehdr segment size is based on
NR_CPUS and CRASH_MAX_MEMORY_RANGES in order to accommodate a
growing number of CPU and memory resources.

For kexec_load(), the userspace kexec utility needs to size the
elfcorehdr segment in the same/similar manner.

To accommodate kexec_load() syscall in the absence of
kexec_file_load() syscall support, prepare_elf_headers() and
dependents are moved outside of CONFIG_KEXEC_FILE.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/Kconfig |   3 +
 arch/x86/include/asm/kexec.h |  15 +
 arch/x86/kernel/crash.c  | 103 ---
 3 files changed, 114 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7dff2481abe0..4b39f4059876 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2063,6 +2063,9 @@ config ARCH_HAS_KEXEC_JUMP
 config ARCH_HAS_CRASH_DUMP
def_bool X86_64 || (X86_32 && HIGHMEM)
 
+config ARCH_HAS_CRASH_HOTPLUG
+   def_bool y
+
 config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || 
CRASH_DUMP)
default "0x100"
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5b77bbc28f96..9143100ea3ea 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -209,6 +209,21 @@ typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..c70a111c44fa 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -158,8 +158,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
crash_save_cpu(regs, safe_smp_processor_id());
 }
 
-#ifdef CONFIG_KEXEC_FILE
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -231,7 +229,7 @@ static int prepare_elf64_ram_headers_callback(struct 
resource *res, void *arg)
 
 /* Prepare elf headers. Return addr and size */
 static int prepare_elf_headers(struct kimage *image, void **addr,
-   unsigned long *sz)
+   unsigned long *sz, unsigned long 
*nr_mem_ranges)
 {
struct crash_mem *cmem;
int ret;
@@ -249,6 +247,9 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
if (ret)
goto out;
 
+   /* Return the computed number of memory ranges, for hotplug usage */
+   *nr_mem_ranges = cmem->nr_ranges;
+
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), 
addr, sz);
 
@@ -257,6 +258,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
return ret;
 }
 
+#ifdef CONFIG_KEXEC_FILE
 static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
 {
unsigned int nr_e820_entries;
@@ -371,18 +373,42 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
 int crash_load_segments(struct kimage *image)
 {
int ret;
+   unsigned long pnum = 0;
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
  .buf_max = ULONG_MAX, .top_down = false };
 
/* Prepare elf headers and add a segment */
-   ret = prepare_elf_headers(image, , );
+   ret = prepare_elf_headers(image, , , );
if (ret)
return ret;
 
-   image->elf_headers = kbuf.buffer;
-   image->elf_headers_sz = kbuf.bufsz;
+   image->elf_headers  = kbuf.buffer;
+   image->elf_headers_sz   = kbuf.bufsz;
+

[PATCH v23 8/8] x86/crash: optimize CPU changes

2023-06-12 Thread Eric DeVolder
crash_prepare_elf64_headers() writes into the elfcorehdr an ELF
PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs
(ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr.

The kimage->file_mode term covers kdump images loaded via the
kexec_file_load() syscall. Since crash_prepare_elf64_headers()
wrote the initial elfcorehdr, no update to the elfcorehdr is
needed for CPU changes.

The kimage->elfcorehdr_updated term covers kdump images loaded via
the kexec_load() syscall. At least one memory or CPU change must occur
to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr.
Afterwards, no update to the elfcorehdr is needed for CPU changes.

This code is intentionally *NOT* hoisted into
crash_handle_hotplug_event() as it would prevent the arch-specific
handler from running for CPU changes. This would break PPC, for
example, which needs to update other information besides the
elfcorehdr, on CPU changes.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/kernel/crash.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index caf22bcb61af..18d2a18d1073 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -467,6 +467,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
unsigned long mem, memsz;
unsigned long elfsz = 0;
 
+   /*
+* As crash_prepare_elf64_headers() has already described all
+* possible CPUs, there is no need to update the elfcorehdr
+* for additional CPU changes.
+*/
+   if ((image->file_mode || image->elfcorehdr_updated) &&
+   ((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+   (image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+   return;
+
/*
 * Create the new elfcorehdr reflecting the changes to CPU and/or
 * memory resources.
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v23 1/8] crash: move a few code bits to setup support of crash hotplug

2023-06-12 Thread Eric DeVolder
The crash hotplug support leans on the work for the kexec_file_load()
syscall. To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

No functionality change intended.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/kexec.h |  30 +++
 kernel/crash_core.c   | 182 ++
 kernel/kexec_file.c   | 181 -
 3 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 22b5cd24f581..811a90e09698 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -105,6 +105,21 @@ struct compat_kexec_segment {
 };
 #endif
 
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN   4096
+
+struct crash_mem {
+   unsigned int max_nr_ranges;
+   unsigned int nr_ranges;
+   struct range ranges[];
+};
+
+extern int crash_exclude_mem_range(struct crash_mem *mem,
+  unsigned long long mstart,
+  unsigned long long mend);
+extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
+  void **addr, unsigned long *sz);
+
 #ifdef CONFIG_KEXEC_FILE
 struct purgatory_info {
/*
@@ -230,21 +245,6 @@ static inline int arch_kexec_locate_mem_hole(struct 
kexec_buf *kbuf)
 }
 #endif
 
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN   4096
-
-struct crash_mem {
-   unsigned int max_nr_ranges;
-   unsigned int nr_ranges;
-   struct range ranges[];
-};
-
-extern int crash_exclude_mem_range(struct crash_mem *mem,
-  unsigned long long mstart,
-  unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
-  void **addr, unsigned long *sz);
-
 #ifndef arch_kexec_apply_relocations_add
 /*
  * arch_kexec_apply_relocations_add - apply relocations of type RELA
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..b7c30b748a16 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -314,6 +315,187 @@ static int __init parse_crashkernel_dummy(char *arg)
 }
 early_param("crashkernel", parse_crashkernel_dummy);
 
+int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+ void **addr, unsigned long *sz)
+{
+   Elf64_Ehdr *ehdr;
+   Elf64_Phdr *phdr;
+   unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+   unsigned char *buf;
+   unsigned int cpu, i;
+   unsigned long long notes_addr;
+   unsigned long mstart, mend;
+
+   /* extra phdr for vmcoreinfo ELF note */
+   nr_phdr = nr_cpus + 1;
+   nr_phdr += mem->nr_ranges;
+
+   /*
+* kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+* area (for example, 8000 - a000 on x86_64).
+* I think this is required by tools like gdb. So same physical
+* memory will be mapped in two ELF headers. One will contain kernel
+* text virtual addresses and other will have __va(physical) addresses.
+*/
+
+   nr_phdr++;
+   elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+   elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+   buf = vzalloc(elf_sz);
+   if (!buf)
+   return -ENOMEM;
+
+   ehdr = (Elf64_Ehdr *)buf;
+   phdr = (Elf64_Phdr *)(ehdr + 1);
+   memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+   ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+   ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
+   ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+   ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+   memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
+   ehdr->e_type = ET_CORE;
+   ehdr->e_machine = ELF_ARCH;
+   ehdr->e_version = EV_CURRENT;
+   ehdr->e_phoff = sizeof(Elf64_Ehdr);
+   ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+   ehdr->e_phentsize = sizeof(Elf64_Phdr);
+
+   /* Prepare one phdr of type PT_NOTE for each present CPU */
+   for_each_present_cpu(cpu) {
+   phdr->p_type = PT_NOTE;
+   notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
+   phdr->p_offset = phdr->p_paddr = notes_addr;
+   phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
+   (ehdr->e_phnum)++;
+   phdr++;
+   }
+
+   /* Prepare one PT_NOTE header for vmcoreinfo */
+   phdr->p_type = PT_NOTE;
+   

[PATCH v23 7/8] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()

2023-06-12 Thread Eric DeVolder
The function crash_prepare_elf64_headers() generates the elfcorehdr
which describes the CPUs and memory in the system for the crash kernel.
In particular, it writes out ELF PT_NOTEs for memory regions and the
CPUs in the system.

With respect to the CPUs, the current implementation utilizes
for_each_present_cpu() which means that as CPUs are added and removed,
the elfcorehdr must again be updated to reflect the new set of CPUs.

The reasoning behind the move to use for_each_possible_cpu(), is:

- At kernel boot time, all percpu crash_notes are allocated for all
  possible CPUs; that is, crash_notes are not allocated dynamically
  when CPUs are plugged/unplugged. Thus the crash_notes for each
  possible CPU are always available.

- The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU.
  Changing to for_each_possible_cpu() is valid as the crash_notes
  pointed to by each CPU PT_NOTE are present and always valid.

Furthermore, examining a common crash processing path of:

 kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
   elfcorehdr  /proc/vmcore vmcore

reveals how the ELF CPU PT_NOTEs are utilized:

- Upon panic, each CPU is sent an IPI and shuts itself down, recording
 its state in its crash_notes. When all CPUs are shutdown, the
 crash kernel is launched with a pointer to the elfcorehdr.

- The crash kernel via linux/fs/proc/vmcore.c does not examine or
 use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.

- The makedumpfile utility uses /proc/vmcore and reads the CPU
 PT_NOTEs to craft a nr_cpus variable, which is reported in a
 header but otherwise generally unused. Makedumpfile creates the
 vmcore.

- The 'crash' dump analyzer does not appear to reference the CPU
 PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
 symbols and directly examines those structure contents from vmcore
 memory. From that information it is able to determine which CPUs
 are present and online, and locate the corresponding crash_notes.
 Said differently, it appears that 'crash' analyzer does not rely
 on the ELF PT_NOTEs for CPUs; rather it obtains the information
 directly via kernel symbols and the memory within the vmcore.

(There maybe other vmcore generating and analysis tools that do use
these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
common solution.)

This results in the benefit of having all CPUs described in the
elfcorehdr, and therefore reducing the need to re-generate the
elfcorehdr on CPU changes, at the small expense of an additional
56 bytes per PT_NOTE for not-present-but-possible CPUs.

On systems where kexec_file_load() syscall is utilized, all the above
is valid. On systems where kexec_load() syscall is utilized, there
may be the need for the elfcorehdr to be regenerated once. The reason
being that some archs only populate the 'present' CPUs from the
/sys/devices/system/cpus entries, which the userspace 'kexec' utility
uses to generate the userspace-supplied elfcorehdr. In this situation,
one memory or CPU change will rewrite the elfcorehdr via the
crash_prepare_elf64_headers() function and now all possible CPUs will
be described, just as with kexec_file_load() syscall.

Suggested-by: Sourabh Jain 
Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/crash_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index fa918176d46d..7378b501fada 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
ehdr->e_ehsize = sizeof(Elf64_Ehdr);
ehdr->e_phentsize = sizeof(Elf64_Phdr);
 
-   /* Prepare one phdr of type PT_NOTE for each present CPU */
-   for_each_present_cpu(cpu) {
+   /* Prepare one phdr of type PT_NOTE for each possible CPU */
+   for_each_possible_cpu(cpu) {
phdr->p_type = PT_NOTE;
notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
phdr->p_offset = phdr->p_paddr = notes_addr;
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v23 2/8] crash: add generic infrastructure for crash hotplug support

2023-06-12 Thread Eric DeVolder
To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining). The crash elfcorehdr describes the CPUs and memory to
be written into the vmcore.

To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE
group, just prior to the STARTING group, which is very close to the
CPU starting up in a plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate. Note that for a CPU being unplugged
or offlined, the CPU will still be present in the list of CPUs
generated by crash_prepare_elf64_headers(). However, there is no
need to explicitly omit the CPU, see justification in
'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'.

To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 include/linux/crash_core.h |   9 +++
 include/linux/kexec.h  |  11 +++
 kernel/Kconfig.kexec   |  31 
 kernel/crash_core.c| 142 +
 kernel/kexec_core.c|   6 ++
 5 files changed, 199 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..e14345cc7a22 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long 
system_ram,
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
 
+#define KEXEC_CRASH_HP_NONE0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_CPU  2
+#define KEXEC_CRASH_HP_ADD_MEMORY  3
+#define KEXEC_CRASH_HP_REMOVE_MEMORY   4
+#define KEXEC_CRASH_HP_INVALID_CPU -1U
+
+struct kimage;
+
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 811a90e09698..b9903dd48e24 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Verify architecture specific macros are defined */
@@ -360,6 +361,12 @@ struct kimage {
struct purgatory_info purgatory_info;
 #endif
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   int hp_action;
+   int elfcorehdr_index;
+   bool elfcorehdr_updated;
+#endif
+
 #ifdef CONFIG_IMA_KEXEC
/* Virtual address of IMA measurement buffer for kexec syscall */
void *ima_buffer;
@@ -490,6 +497,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, 
unsigned int pages, g
 static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) 
{ }
 #endif
 
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index 660048099865..a117163fde45 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -100,4 +100,35 @@ config CRASH_DUMP
  For s390, this option also enables zfcpdump.
  See also 
 
+config CRASH_HOTPLUG
+   bool "Update the crash elfcorehdr on system configuration changes"
+   default y
+   depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+   depends on ARCH_HAS_CRASH_HOTPLUG
+   help
+ Enable direct update to the crash elfcorehdr (which contains
+ the list of CPUs and memory regions to be dumped upon a crash)
+ in response to hot plug/unplug or online/offline of CPUs or
+ memory. This is a much more advanced approach than userspace
+ attempting that.
+
+ If unsure, say Y.
+
+config CRASH_MAX_MEMORY_RANGES
+   int "Specify the maximum number of memory regions for the elfcorehdr"
+   default 8192
+   depends on CRASH_HOTPLUG
+   help
+ For t

[PATCH v23 3/8] kexec: exclude elfcorehdr from the segment digest

2023-06-12 Thread Eric DeVolder
When a crash kernel is loaded via the kexec_file_load() syscall, the
kernel places the various segments (ie crash kernel, crash initrd,
boot_params, elfcorehdr, purgatory, etc) in memory. For those
architectures that utilize purgatory, a hash digest of the segments
is calculated for integrity checking. The digest is embedded into
the purgatory image prior to placing in memory.

Updates to the elfcorehdr in response to CPU and memory changes
would cause the purgatory integrity checking to fail (at crash time,
and no vmcore created). Therefore, the elfcorehdr segment is
explicitly excluded from the purgatory digest, enabling updates to
the elfcorehdr while also avoiding the need to recompute the hash
digest and reload purgatory.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 kernel/kexec_file.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f8b1797b3ec9..1d2cfc869a75 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -726,6 +726,12 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
for (j = i = 0; i < image->nr_segments; i++) {
struct kexec_segment *ksegment;
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   /* Exclude elfcorehdr segment to allow future changes via 
hotplug */
+   if (j == image->elfcorehdr_index)
+   continue;
+#endif
+
ksegment = >segment[i];
/*
 * Skip purgatory as it will be modified once we put digest
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v23 6/8] crash: hotplug support for kexec_load()

2023-06-12 Thread Eric DeVolder
The hotplug support for kexec_load() requires changes to the
userspace kexec-tools and a little extra help from the kernel.

Given a kdump capture kernel loaded via kexec_load(), and a
subsequent hotplug event, the crash hotplug handler finds the
elfcorehdr and rewrites it to reflect the hotplug change.
That is the desired outcome, however, at kernel panic time,
the purgatory integrity check fails (because the elfcorehdr
changed), and the capture kernel does not boot and no vmcore
is generated.

Therefore, the userspace kexec-tools/kexec must indicate to the
kernel that the elfcorehdr can be modified (because the kexec
excluded the elfcorehdr from the digest, and sized the elfcorehdr
memory buffer appropriately).

To facilitate hotplug support with kexec_load():
 - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is
   safe for the kernel to modify the kexec_load()'d elfcorehdr
 - the /sys/kernel/crash_elfcorehdr_size node communicates the
   preferred size of the elfcorehdr memory buffer
 - The sysfs crash_hotplug nodes (ie.
   /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically
   take into account kexec_file_load() vs kexec_load() and
   KEXEC_UPDATE_ELFCOREHDR.
   This is critical so that the udev rule processing of crash_hotplug
   is all that is needed to determine if the userspace unload-then-load
   of the kdump image is to be skipped, or not. The proposed udev
   rule change looks like:
   # The kernel updates the crash elfcorehdr for CPU and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

The table below indicates the behavior of kexec_load()'d kdump image
updates (with the new udev crash_hotplug rule in place):

 Kernel |Kexec
 ---+-+
 Old|Old  |New
|  a  | a
 ---+-+
 New|  a  | b
 ---+-+

where kexec 'old' and 'new' delineate kexec-tools has the needed
modifications for the crash hotplug feature, and kernel 'old' and
'new' delineate the kernel supports this crash hotplug feature.

Behavior 'a' indicates the unload-then-reload of the entire kdump
image. For the kexec 'old' column, the unload-then-reload occurs
due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel
(with 'new' kexec) does not present the crash_hotplug sysfs node,
which leads to the unload-then-reload of the kdump image.

Behavior 'b' indicates the desired optimized behavior of the kernel
directly modifying the elfcorehdr and avoiding the unload-then-reload
of the kdump image.

If the udev rule is not updated with crash_hotplug node check, then
no matter any combination of kernel or kexec is new or old, the
kdump image continues to be unload-then-reload on hotplug changes.

To fully support crash hotplug feature, there needs to be a rollout
of kernel, kexec-tools and udev rule changes. However, the order of
the rollout of these pieces does not matter; kexec_load()'d kdump
images still function for hotplug as-is.

Suggested-by: Hari Bathini 
Signed-off-by: Eric DeVolder 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 arch/x86/include/asm/kexec.h | 11 +++
 arch/x86/kernel/crash.c  | 27 +++
 include/linux/kexec.h| 14 --
 include/uapi/linux/kexec.h   |  1 +
 kernel/crash_core.c  | 31 +++
 kernel/kexec.c   |  5 +
 kernel/ksysfs.c  | 15 +++
 7 files changed, 98 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage 
*image);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(void);
+#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static inline int crash_hotplug_memory_support(void) { return 1; }
-#define crash_hotplug_memory_support crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(void);
+#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
 #endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c70a111c44fa..caf22bcb61af 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -427,6 +427,33 @@ int crash_load_segments(struct kimage *image)
 #undef pr_fmt
 #define pr_fmt(fmt) "crash hp: " fmt
 
+/* These functions provide the

[PATCH v23 0/8] crash: Kernel handling of CPU and memory hot un/plug

2023-06-12 Thread Eric DeVolder
   kexec_file_load support patch.

v7: 13apr2022
 https://lkml.org/lkml/2022/4/13/850
 https://lore.kernel.org/lkml/20220413164237.20845-1-eric.devol...@oracle.com/
 - Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
 https://lkml.org/lkml/2022/4/1/1203
 https://lore.kernel.org/lkml/20220401183040.1624-1-eric.devol...@oracle.com/
 - Reword commit messages and some comment cleanup per Baoquan.
 - Changed elf_index to elfcorehdr_index for clarity.
 - Minor code changes per Baoquan.

v5: 3mar2022
 https://lkml.org/lkml/2022/3/3/674
 https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/
 - Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
   David Hildenbrand.
 - Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
 https://lkml.org/lkml/2022/2/9/1406
 https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/
 - Refactored patches per Baoquan suggestsions.
 - A few corrections, per Baoquan.

v3: 10jan2022
 https://lkml.org/lkml/2022/1/10/1212
 https://lore.kernel.org/lkml/20220110195727.1682-1-eric.devol...@oracle.com/
 - Rebasing per Baoquan He request.
 - Changed memory notifier per David Hildenbrand.
 - Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
 https://lkml.org/lkml/2021/12/7/1088
 https://lore.kernel.org/lkml/20211207195204.1582-1-eric.devol...@oracle.com/
 - Acting upon Baoquan He suggestion of removing elfcorehdr from
   the purgatory list of segments, removed purgatory code from
   patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
 https://lkml.org/lkml/2021/11/18/845
 https://lore.kernel.org/lkml/2028174948.37435-1-eric.devol...@oracle.com/
 - working patchset demonstrating kernel handling of hotplug
   updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
 https://lkml.org/lkml/2020/12/14/532
 https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d90...@oracle.com/
 - proposed concept of allowing kernel to handle hotplug update
   of elfcorehdr
---


Eric DeVolder (8):
  crash: move a few code bits to setup support of crash hotplug
  crash: add generic infrastructure for crash hotplug support
  kexec: exclude elfcorehdr from the segment digest
  crash: memory and CPU hotplug sysfs attributes
  x86/crash: add x86 crash hotplug support
  crash: hotplug support for kexec_load()
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  x86/crash: optimize CPU changes

 .../admin-guide/mm/memory-hotplug.rst |   8 +
 Documentation/core-api/cpu_hotplug.rst|  18 +
 arch/x86/Kconfig  |   3 +
 arch/x86/include/asm/kexec.h  |  18 +
 arch/x86/kernel/crash.c   | 140 ++-
 drivers/base/cpu.c|  14 +
 drivers/base/memory.c |  13 +
 include/linux/crash_core.h|   9 +
 include/linux/kexec.h |  63 +++-
 include/uapi/linux/kexec.h|   1 +
 kernel/Kconfig.kexec  |  31 ++
 kernel/crash_core.c   | 355 ++
 kernel/kexec.c|   5 +
 kernel/kexec_core.c   |   6 +
 kernel/kexec_file.c   | 187 +
 kernel/ksysfs.c   |  15 +
 16 files changed, 681 insertions(+), 205 deletions(-)

-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v23 4/8] crash: memory and CPU hotplug sysfs attributes

2023-06-12 Thread Eric DeVolder
Introduce the crash_hotplug attribute for memory and CPUs for
use by userspace.  These attributes directly facilitate the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/memory/memory81
  looking at device '/devices/system/memory/memory81':
KERNEL=="memory81"
SUBSYSTEM=="memory"
DRIVER==""
ATTR{online}=="1"
ATTR{phys_device}=="0"
ATTR{phys_index}=="0051"
ATTR{removable}=="1"
ATTR{state}=="online"
ATTR{valid_zones}=="Movable"

  looking at parent device '/devices/system/memory':
KERNELS=="memory"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{auto_online_blocks}=="offline"
ATTRS{block_size_bytes}=="800"
ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
  looking at device '/devices/system/cpu/cpu0':
KERNEL=="cpu0"
SUBSYSTEM=="cpu"
DRIVER=="processor"
ATTR{crash_notes}=="277c38600"
ATTR{crash_notes_size}=="368"
ATTR{online}=="1"

  looking at parent device '/devices/system/cpu':
KERNELS=="cpu"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{crash_hotplug}=="1"
ATTRS{isolated}==""
ATTRS{kernel_max}=="8191"
ATTRS{nohz_full}=="  (null)"
ATTRS{offline}=="4-7"
ATTRS{online}=="0-3"
ATTRS{possible}=="0-7"
ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

 # The kernel updates the crash elfcorehdr for CPU and memory changes
 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
 SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules
test if crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule skips
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
---
 .../admin-guide/mm/memory-hotplug.rst  |  8 
 Documentation/core-api/cpu_hotplug.rst | 18 ++
 drivers/base/cpu.c | 14 ++
 drivers/base/memory.c  | 13 +
 include/linux/kexec.h  |  8 
 5 files changed, 61 insertions(+)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst 
b/Documentation/admin-guide/mm/memory-hotplug.rst
index 1b02fe5807cc..eb99d79223a3 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -291,6 +291,14 @@ The following files are currently defined:
   Availability depends on the CONFIG_ARCH_MEMORY_PROBE
   kernel configuration option.
 ``uevent``read-write: generic udev file for device subsystems.
+``crash_hotplug``  read-only: when changes to the system memory map
+  occur due to hot un/plug of memory, this file contains
+  '1' if the kernel updates the kdump capture kernel memory
+  map itself (via elfcorehdr), or '0' if userspace must 
update
+  the kdump capture kernel memory map.
+
+  Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
+  configuration option.
 == 
=
 
 .. note::
diff --git a/Documentation/core-api/cpu_hotplug.rst 
b/Documentation/core-api/cpu_hotplug.rst
index f75778d37488..0c8dc3fe5

Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()

2023-05-10 Thread Eric DeVolder



On 5/9/23 01:15, Sourabh Jain wrote:


On 04/05/23 04:11, Eric DeVolder wrote:

The hotplug support for kexec_load() requires coordination with
userspace, and therefore a little extra help from the kernel to
facilitate the coordination.

In the absence of the solution contained within this particular
patch, if a kdump capture kernel is loaded via kexec_load() syscall,
then the crash hotplug logic would find the segment containing the
elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
generally speaking that is the desired behavior and outcome, a
problem arises from the fact that if the kdump image includes a
purgatory that performs a digest checksum, then that check would
fail (because the elfcorehdr was changed), and the capture kernel
would fail to boot and no kdump occur.

Therefore, what is needed is for the userspace kexec-tools to
indicate to the kernel whether or not the supplied kdump image/
elfcorehdr can be modified (because the kexec-tools excludes the
elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
appropriately).

To solve these problems, this patch introduces:
  - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is


Architectures may need to update kexec segment other then elfcorehdr.
How about changing the flag name to KEXEC_UPDATE_SEGMENTS?

- Sourabh


These seems almost too generic and vague. I get that for PPC this flag
will drive updating elfcorehdr as well as FDT, so the flag is over-loaded
in a sense.

Another idea for the name?
eric



    safe for the kernel to modify the elfcorehdr (because kexec-tools
    has excluded the elfcorehdr from the digest).
  - the /sys/kernel/crash_elfcorehdr_size node to communicate to
    kexec-tools what the preferred size of the elfcorehdr memory buffer
    should be in order to accommodate hotplug changes.
  - The sysfs crash_hotplug nodes (ie.
    /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
    that they examine kexec_file_load() vs kexec_load(), and when
    kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
    This is critical so that the udev rule processing of crash_hotplug
    indicates correctly (ie. the userspace unload-then-load of the
    kdump of the kdump image can be skipped, or not).

With this patch in place, I believe the following statements to be true
(with local testing to verify):

  - For systems which have these kernel changes in place, but not the
    corresponding changes to the crash hot plug udev rules and
    kexec-tools, (ie "older" systems) those systems will continue to
    unload-then-load the kdump image, as has always been done. The
    kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
  - For systems which have these kernel changes in place and the proposed
    udev rule changes in place, but not the kexec-tools changes in place:
 - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
   so the unload-then-reload of kdump image will occur (the sysfs
   crash_hotplug nodes will show 0).
 - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
   to show 1, and the kernel will modify the elfcorehdr directly. And
   with the udev changes in place, the unload-then-load will not occur!
  - For systems which have these kernel changes as well as the udev and
    kexec-tools changes in place, then the user/admin has full authority
    over the enablement and support of crash hotplug support, whether via
    kexec_file_load() or kexec_load().

Said differently, as kexec_load() was/is widely in use, these changes
permit it to continue to be used as-is (retaining the current unload-then-
reload behavior) until such time as the udev and kexec-tools changes can
be rolled out as well.

I've intentionally kept the changes related to userspace coordination
for kexec_load() separate as this need was identified late; the
rest of this series has been generally reviewed and accepted. Once
this support has been vetted, I can refactor if needed.

Suggested-by: Hari Bathini 
Signed-off-by: Eric DeVolder 
---
  arch/x86/include/asm/kexec.h | 11 +++
  arch/x86/kernel/crash.c  | 27 +++
  include/linux/kexec.h    | 14 --
  include/uapi/linux/kexec.h   |  1 +
  kernel/crash_core.c  | 31 +++
  kernel/kexec.c   |  3 +++
  kernel/ksysfs.c  | 15 +++
  7 files changed, 96 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage 
*image);
  #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
  #ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_c

Re: [PATCH v22 5/8] x86/crash: add x86 crash hotplug support

2023-05-10 Thread Eric DeVolder




On 5/9/23 17:52, Thomas Gleixner wrote:

On Wed, May 03 2023 at 18:41, Eric DeVolder wrote:

In the patch 'kexec: exclude elfcorehdr from the segment digest'


See reply to 8/8

yep


diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 53bab123a8ee..80538524c494 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2119,6 +2119,19 @@ config CRASH_DUMP
  (CONFIG_RELOCATABLE=y).
  For more details see Documentation/admin-guide/kdump/kdump.rst
  
+config CRASH_HOTPLUG

+   bool "Update the crash elfcorehdr on system configuration changes"
+   default y
+   depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+   help
+ Enable direct update to the crash elfcorehdr (which contains
+ the list of CPUs and memory regions to be dumped upon a crash)
+ in response to hot plug/unplug or online/offline of CPUs or
+ memory. This is a much more advanced approach than userspace
+ attempting that.
+
+ If unsure, say Y.


Why is this config an X86 specific thing?

Neither CRASH_DUMP nor HOTPLUG_CPU nor MEMORY_HOTPLUG are in any way X86
specific at all. So why can't you stick that into a place where it can
be reused by other architectures?

It's not rocket science to do

+   depends on WANTS_CRASH_HOTPLUG && CRASH_DUMP && (HOTPLUG_CPU || 
MEMORY_HOTPLUG)

or something like that. It's so tiring to have x86 Kconfig be the dump
ground for the initial implementation, then having the sh*t copied to
every other architecture and the cleanup is left to the maintainers.

It's not rocket science to differentiate between a real architecture
specific option and a generally useful option in the first place, right?


Right. To your point, CRASH_DUMP has been copied in all the archs:
arch/arm/Kconfig:config CRASH_DUMP
arch/arm64/Kconfig:config CRASH_DUMP
arch/ia64/Kconfig:config CRASH_DUMP
arch/mips/Kconfig:config CRASH_DUMP
arch/powerpc/Kconfig:config CRASH_DUMP
arch/riscv/Kconfig:config CRASH_DUMP
arch/s390/Kconfig:config CRASH_DUMP
arch/sh/Kconfig:config CRASH_DUMP
arch/x86/Kconfig:config CRASH_DUMP
arch/loongarch/Kconfig:config CRASH_DUMP

Likewise for KEXEC and KEXEC_FILE.

I've looked into this in the past, and looking again today, I don't see a 
natural
place to put the option. Perhaps starting a kernel/Kconfig.kexec?





+#ifdef CONFIG_CRASH_HOTPLUG
+   /*
+* Ensure the elfcorehdr segment large enough for hotplug changes.
+* Account for VMCOREINFO and kernel_map and maximum CPUs.


Neither the first line nor the second one qualifies as parseable sentences.



What about:
Ensure the elfcorehdr segment is large enough for hotplug changes.
The segment size accounts for VMCOREINFO, kernel_map, maximum CPUs
and maximum memory ranges.



+/**
+ * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
+ * @image: the active struct kimage


What is an active struct kimage?


How about this:
@image: a pointer to kexec_crash_image


+ *
+ * The new elfcorehdr is prepared in a kernel buffer, and then it is
+ * written on top of the existing/old elfcorehdr.


-ENOPARSE


How about:
Prepare the new elfcorehdr and replace the existing elfcorehdr.


+ */
+void arch_crash_handle_hotplug_event(struct kimage *image)
+{
+   void *elfbuf = NULL, *old_elfcorehdr;
+   unsigned long nr_mem_ranges;
+   unsigned long mem, memsz;
+   unsigned long elfsz = 0;
+
+   /*
+* Create the new elfcorehdr reflecting the changes to CPU and/or
+* memory resources.
+*/
+   if (prepare_elf_headers(image, , , _mem_ranges)) {
+   pr_err("unable to prepare elfcore headers");
+   goto out;


So this can fail. Why is there just a pr_err() and no return value which
tells the caller that this failed?

An error in the crash elfcorehdr infrastructure introduced in this series
is not a reason to rollback state. The cpuhp and memory notifier callbacks
always return an OK.
The primary errors that might occur are failure to obtain the kexec_lock,
and failure to obtain a temporary kernel buffer to stage the new elfcorehdr.
How about:
pr_err("prepare_elf_headers() failed");




+   /*
+* Copy new elfcorehdr over the old elfcorehdr at destination.
+*/
+   old_elfcorehdr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
+   if (!old_elfcorehdr) {
+   pr_err("updating elfcorehdr failed\n");


How hard is it to write an error message which is clearly describing the
problem?


How about:
pr_err("mapping elfcorehdr segment failed");


Thanks,

 tglx

Again, thanks for the fresh eyes!
eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v22 8/8] x86/crash: optimize CPU changes

2023-05-10 Thread Eric DeVolder




On 5/9/23 17:39, Thomas Gleixner wrote:

On Wed, May 03 2023 at 18:41, Eric DeVolder wrote:

This patch is dependent upon the patch 'crash: change


Seriously? You send a patch series which is ordered in itself and then
tell in the changelog of patch 8/8 that it depends on patch 7/8?

This information is complete garbage once the patches are applied and
ends up in the git logs and even for the submission it's useless
information.

Patch series are usually ordered by dependecy, no?

Aside of that please do:

# git grep 'This patch' Documentation/process/


I'll remove, and re-examine the messages to use imperative tone.


crash_prepare_elf64_headers() to for_each_possible_cpu()'. With that
patch, crash_prepare_elf64_headers() writes out an ELF CPU PT_NOTE
for all possible CPUs, thus further CPU changes to the elfcorehdr
are not needed.


I'm having a hard time to decode this word salad.

   crash_prepare_elf64_headers() is writing out an ELF CPU PT_NOTE for
   all possible CPUs, thus further changes to the ELF core header are
   not required.

Makes some sense to me.


How about this?

crash_prepare_elf64_headers() writes into the elfcorehdr an ELF
PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs
(ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr.




This change works for kexec_file_load() and kexec_load() syscalls.
For kexec_file_load(), crash_prepare_elf64_headers() is utilized
directly and thus all ELF CPU PT_NOTEs are in the elfcorehdr already.
This is the kimage->file_mode term.
For kexec_load() syscall, one CPU or memory change will cause the
elfcorehdr to be updated via crash_prepare_elf64_headers() and at
that point all ELF CPU PT_NOTEs are in the elfcorehdr. This is the
kimage->elfcorehdr_updated term.


Sorry. I tried hard, but this is completely incomprehensible.


How about this?

The kimage->file_mode term covers kdump images loaded via the
kexec_file_load() syscall. Since crash_prepare_elf64_headers()
wrote the initial elfcorehdr, no update to the elfcorehdr is
needed for CPU changes.

The kimage->elfcorehdr_updated term covers kdump images loaded via
the kexec_load() syscall. At least one memory or CPU change must occur
to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr.
Afterwards, no update to the elfcorehdr is needed for CPU changes.


diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 8064e65de6c0..3157e6068747 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -483,6 +483,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
unsigned long mem, memsz;
unsigned long elfsz = 0;
  
+	/* As crash_prepare_elf64_headers() has already described all


This is not a proper multiline comment. Please read and follow the tip
tree documentation along with all other things which are documented
there:

   https://www.kernel.org/doc/html/latest/process/maintainer-tip.html

This documentation is not there for entertainment value or exists just
because we are bored to death.


I'll fix it; unintentional. Should checkpatch.pl catch this (it did not)?


+* possible CPUs, there is no need to update the elfcorehdr
+* for additional CPU changes. This works for both kexec_load()
+* and kexec_file_load() syscalls.


And it does not work for what?


I'll remove this.

I keep using phrases like this since kexec_file_load() is wholly controlled by the kernel code, 
where as kexec_load() has userspace dependencies. In this case,the sentence isn't warranted; it

will work; no exceptional cases.


You cannot expect that anyone who reads this code is an kexec/crash*
wizard who might be able to deduce the meaning of this.

Thanks,

 tglx


Yes, thanks for the fresh eyes!
eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()

2023-05-09 Thread Eric DeVolder



On 5/9/23 01:56, Sourabh Jain wrote:
> 
> On 04/05/23 04:11, Eric DeVolder wrote:
>> The hotplug support for kexec_load() requires coordination with
>> userspace, and therefore a little extra help from the kernel to
>> facilitate the coordination.
>>
>> In the absence of the solution contained within this particular
>> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
>> then the crash hotplug logic would find the segment containing the
>> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
>> generally speaking that is the desired behavior and outcome, a
>> problem arises from the fact that if the kdump image includes a
>> purgatory that performs a digest checksum, then that check would
>> fail (because the elfcorehdr was changed), and the capture kernel
>> would fail to boot and no kdump occur.
>>
>> Therefore, what is needed is for the userspace kexec-tools to
>> indicate to the kernel whether or not the supplied kdump image/
>> elfcorehdr can be modified (because the kexec-tools excludes the
>> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
>> appropriately).
>>
>> To solve these problems, this patch introduces:
>>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>>     has excluded the elfcorehdr from the digest).
>>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>>     should be in order to accommodate hotplug changes.
>>   - The sysfs crash_hotplug nodes (ie.
>>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>>     that they examine kexec_file_load() vs kexec_load(), and when
>>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>>     This is critical so that the udev rule processing of crash_hotplug
>>     indicates correctly (ie. the userspace unload-then-load of the
>>     kdump of the kdump image can be skipped, or not).
>>
>> With this patch in place, I believe the following statements to be true
>> (with local testing to verify):
>>
>>   - For systems which have these kernel changes in place, but not the
>>     corresponding changes to the crash hot plug udev rules and
>>     kexec-tools, (ie "older" systems) those systems will continue to
>>     unload-then-load the kdump image, as has always been done. The
>>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>>   - For systems which have these kernel changes in place and the proposed
>>     udev rule changes in place, but not the kexec-tools changes in place:
>>  - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>>    so the unload-then-reload of kdump image will occur (the sysfs
>>    crash_hotplug nodes will show 0).
>>  - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>>    to show 1, and the kernel will modify the elfcorehdr directly. And
>>    with the udev changes in place, the unload-then-load will not occur!
>>   - For systems which have these kernel changes as well as the udev and
>>     kexec-tools changes in place, then the user/admin has full authority
>>     over the enablement and support of crash hotplug support, whether via
>>     kexec_file_load() or kexec_load().
>>
>> Said differently, as kexec_load() was/is widely in use, these changes
>> permit it to continue to be used as-is (retaining the current unload-then-
>> reload behavior) until such time as the udev and kexec-tools changes can
>> be rolled out as well.
>>
>> I've intentionally kept the changes related to userspace coordination
>> for kexec_load() separate as this need was identified late; the
>> rest of this series has been generally reviewed and accepted. Once
>> this support has been vetted, I can refactor if needed.
>>
>> Suggested-by: Hari Bathini 
>> Signed-off-by: Eric DeVolder 
>> ---
>>   arch/x86/include/asm/kexec.h | 11 +++
>>   arch/x86/kernel/crash.c  | 27 +++
>>   include/linux/kexec.h    | 14 --
>>   include/uapi/linux/kexec.h   |  1 +
>>   kernel/crash_core.c  | 31 +++
>>   kernel/kexec.c   |  3 +++
>>   kernel/ksysfs.c  | 15 +++
>>   7 files changed, 96 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 

[PATCH v22 6/8] crash: hotplug support for kexec_load()

2023-05-03 Thread Eric DeVolder
The hotplug support for kexec_load() requires coordination with
userspace, and therefore a little extra help from the kernel to
facilitate the coordination.

In the absence of the solution contained within this particular
patch, if a kdump capture kernel is loaded via kexec_load() syscall,
then the crash hotplug logic would find the segment containing the
elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
generally speaking that is the desired behavior and outcome, a
problem arises from the fact that if the kdump image includes a
purgatory that performs a digest checksum, then that check would
fail (because the elfcorehdr was changed), and the capture kernel
would fail to boot and no kdump occur.

Therefore, what is needed is for the userspace kexec-tools to
indicate to the kernel whether or not the supplied kdump image/
elfcorehdr can be modified (because the kexec-tools excludes the
elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
appropriately).

To solve these problems, this patch introduces:
 - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
   safe for the kernel to modify the elfcorehdr (because kexec-tools
   has excluded the elfcorehdr from the digest).
 - the /sys/kernel/crash_elfcorehdr_size node to communicate to
   kexec-tools what the preferred size of the elfcorehdr memory buffer
   should be in order to accommodate hotplug changes.
 - The sysfs crash_hotplug nodes (ie.
   /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
   that they examine kexec_file_load() vs kexec_load(), and when
   kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
   This is critical so that the udev rule processing of crash_hotplug
   indicates correctly (ie. the userspace unload-then-load of the
   kdump of the kdump image can be skipped, or not).

With this patch in place, I believe the following statements to be true
(with local testing to verify):

 - For systems which have these kernel changes in place, but not the
   corresponding changes to the crash hot plug udev rules and
   kexec-tools, (ie "older" systems) those systems will continue to
   unload-then-load the kdump image, as has always been done. The
   kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
 - For systems which have these kernel changes in place and the proposed
   udev rule changes in place, but not the kexec-tools changes in place:
- the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
  so the unload-then-reload of kdump image will occur (the sysfs
  crash_hotplug nodes will show 0).
- the use of kexec_file_load() will permit sysfs crash_hotplug nodes
  to show 1, and the kernel will modify the elfcorehdr directly. And
  with the udev changes in place, the unload-then-load will not occur!
 - For systems which have these kernel changes as well as the udev and
   kexec-tools changes in place, then the user/admin has full authority
   over the enablement and support of crash hotplug support, whether via
   kexec_file_load() or kexec_load().

Said differently, as kexec_load() was/is widely in use, these changes
permit it to continue to be used as-is (retaining the current unload-then-
reload behavior) until such time as the udev and kexec-tools changes can
be rolled out as well.

I've intentionally kept the changes related to userspace coordination
for kexec_load() separate as this need was identified late; the
rest of this series has been generally reviewed and accepted. Once
this support has been vetted, I can refactor if needed.

Suggested-by: Hari Bathini 
Signed-off-by: Eric DeVolder 
---
 arch/x86/include/asm/kexec.h | 11 +++
 arch/x86/kernel/crash.c  | 27 +++
 include/linux/kexec.h| 14 --
 include/uapi/linux/kexec.h   |  1 +
 kernel/crash_core.c  | 31 +++
 kernel/kexec.c   |  3 +++
 kernel/ksysfs.c  | 15 +++
 7 files changed, 96 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage 
*image);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(void);
+#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static inline int crash_hotplug_memory_support(void) { return 1; }
-#define crash_hotplug_memory_support crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(void);
+#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
 #endif
+
+unsigned int arch_crash_get_elfco

[PATCH v22 4/8] crash: memory and CPU hotplug sysfs attributes

2023-05-03 Thread Eric DeVolder
This introduces the crash_hotplug attribute for memory and CPUs
for use by userspace.  This change directly facilitates the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, this changeset introduces the crash_hotplug attribute
to the /sys/devices/system/memory directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/memory/memory81
  looking at device '/devices/system/memory/memory81':
KERNEL=="memory81"
SUBSYSTEM=="memory"
DRIVER==""
ATTR{online}=="1"
ATTR{phys_device}=="0"
ATTR{phys_index}=="0051"
ATTR{removable}=="1"
ATTR{state}=="online"
ATTR{valid_zones}=="Movable"

  looking at parent device '/devices/system/memory':
KERNELS=="memory"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{auto_online_blocks}=="offline"
ATTRS{block_size_bytes}=="800"
ATTRS{crash_hotplug}=="1"

For CPUs, this changeset introduces the crash_hotplug attribute
to the /sys/devices/system/cpu directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
  looking at device '/devices/system/cpu/cpu0':
KERNEL=="cpu0"
SUBSYSTEM=="cpu"
DRIVER=="processor"
ATTR{crash_notes}=="277c38600"
ATTR{crash_notes_size}=="368"
ATTR{online}=="1"

  looking at parent device '/devices/system/cpu':
KERNELS=="cpu"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{crash_hotplug}=="1"
ATTRS{isolated}==""
ATTRS{kernel_max}=="8191"
ATTRS{nohz_full}=="  (null)"
ATTRS{offline}=="4-7"
ATTRS{online}=="0-3"
ATTRS{possible}=="0-7"
ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

 # The kernel updates the crash elfcorehdr for CPU and memory changes
 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
 SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above change
tests if crash_hotplug is set, and if so, it skips the userspace
initiated unload-then-reload of the crash kernel.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule will skip
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Baoquan He 
---
 .../admin-guide/mm/memory-hotplug.rst  |  8 
 Documentation/core-api/cpu_hotplug.rst | 18 ++
 drivers/base/cpu.c | 14 ++
 drivers/base/memory.c  | 13 +
 include/linux/kexec.h  |  8 
 5 files changed, 61 insertions(+)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst 
b/Documentation/admin-guide/mm/memory-hotplug.rst
index 1b02fe5807cc..eb99d79223a3 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -291,6 +291,14 @@ The following files are currently defined:
   Availability depends on the CONFIG_ARCH_MEMORY_PROBE
   kernel configuration option.
 ``uevent``read-write: generic udev file for device subsystems.
+``crash_hotplug``  read-only: when changes to the system memory map
+  occur due to hot un/plug of memory, this file contains
+  '1' if the kernel updates the kdump capture kernel memory
+  map itself (via elfcorehdr), or '0' if userspace must 
update
+  the kdump capture kernel memory map.
+
+  Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
+  configuration option.
 == 
=
 
 .. note::
diff --git a/Documentation/core-api/cpu_hotplug.rst 
b/Documentation/core-api/cpu_hotplug.rst
index f75

[PATCH v22 1/8] crash: move a few code bits to setup support of crash hotplug

2023-05-03 Thread Eric DeVolder
The crash hotplug support leans on the work for the kexec_file_load()
syscall. To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

No functionality change intended.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Baoquan He 
---
 include/linux/kexec.h |  30 +++
 kernel/crash_core.c   | 182 ++
 kernel/kexec_file.c   | 181 -
 3 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 22b5cd24f581..811a90e09698 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -105,6 +105,21 @@ struct compat_kexec_segment {
 };
 #endif
 
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN   4096
+
+struct crash_mem {
+   unsigned int max_nr_ranges;
+   unsigned int nr_ranges;
+   struct range ranges[];
+};
+
+extern int crash_exclude_mem_range(struct crash_mem *mem,
+  unsigned long long mstart,
+  unsigned long long mend);
+extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
+  void **addr, unsigned long *sz);
+
 #ifdef CONFIG_KEXEC_FILE
 struct purgatory_info {
/*
@@ -230,21 +245,6 @@ static inline int arch_kexec_locate_mem_hole(struct 
kexec_buf *kbuf)
 }
 #endif
 
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN   4096
-
-struct crash_mem {
-   unsigned int max_nr_ranges;
-   unsigned int nr_ranges;
-   struct range ranges[];
-};
-
-extern int crash_exclude_mem_range(struct crash_mem *mem,
-  unsigned long long mstart,
-  unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
-  void **addr, unsigned long *sz);
-
 #ifndef arch_kexec_apply_relocations_add
 /*
  * arch_kexec_apply_relocations_add - apply relocations of type RELA
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..b7c30b748a16 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -314,6 +315,187 @@ static int __init parse_crashkernel_dummy(char *arg)
 }
 early_param("crashkernel", parse_crashkernel_dummy);
 
+int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+ void **addr, unsigned long *sz)
+{
+   Elf64_Ehdr *ehdr;
+   Elf64_Phdr *phdr;
+   unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+   unsigned char *buf;
+   unsigned int cpu, i;
+   unsigned long long notes_addr;
+   unsigned long mstart, mend;
+
+   /* extra phdr for vmcoreinfo ELF note */
+   nr_phdr = nr_cpus + 1;
+   nr_phdr += mem->nr_ranges;
+
+   /*
+* kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+* area (for example, 8000 - a000 on x86_64).
+* I think this is required by tools like gdb. So same physical
+* memory will be mapped in two ELF headers. One will contain kernel
+* text virtual addresses and other will have __va(physical) addresses.
+*/
+
+   nr_phdr++;
+   elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+   elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+   buf = vzalloc(elf_sz);
+   if (!buf)
+   return -ENOMEM;
+
+   ehdr = (Elf64_Ehdr *)buf;
+   phdr = (Elf64_Phdr *)(ehdr + 1);
+   memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+   ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+   ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
+   ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+   ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+   memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
+   ehdr->e_type = ET_CORE;
+   ehdr->e_machine = ELF_ARCH;
+   ehdr->e_version = EV_CURRENT;
+   ehdr->e_phoff = sizeof(Elf64_Ehdr);
+   ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+   ehdr->e_phentsize = sizeof(Elf64_Phdr);
+
+   /* Prepare one phdr of type PT_NOTE for each present CPU */
+   for_each_present_cpu(cpu) {
+   phdr->p_type = PT_NOTE;
+   notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
+   phdr->p_offset = phdr->p_paddr = notes_addr;
+   phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
+   (ehdr->e_phnum)++;
+   phdr++;
+   }
+
+   /* Prepare one PT_NOTE header for vmcoreinfo */
+   phdr->p_type = PT_NOTE;
+   phdr->p_offset = phdr-&

[PATCH v22 3/8] kexec: exclude elfcorehdr from the segment digest

2023-05-03 Thread Eric DeVolder
When a crash kernel is loaded via the kexec_file_load() syscall, the
kernel places the various segments (ie crash kernel, crash initrd,
boot_params, elfcorehdr, purgatory, etc) in memory. For those
architectures that utilize purgatory, a hash digest of the segments
is calculated for integrity checking. This digest is embedded into
the purgatory image prior to placing purgatory in memory.

This patchset updates the elfcorehdr on CPU or memory changes.
However, changes to the elfcorehdr in turn cause purgatory
integrity checking to fail (at crash time, and no vmcore created).
Therefore, this patch explicitly excludes the elfcorehdr segment
from the list of segments used to create the digest. By doing so,
this permits updates to the elfcorehdr in response to CPU or memory
changes, and avoids the need to also recompute the hash digest and
reload purgatory.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Baoquan He 
---
 kernel/kexec_file.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f8b1797b3ec9..1d2cfc869a75 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -726,6 +726,12 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
for (j = i = 0; i < image->nr_segments; i++) {
struct kexec_segment *ksegment;
 
+#ifdef CONFIG_CRASH_HOTPLUG
+   /* Exclude elfcorehdr segment to allow future changes via 
hotplug */
+   if (j == image->elfcorehdr_index)
+   continue;
+#endif
+
ksegment = >segment[i];
/*
 * Skip purgatory as it will be modified once we put digest
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v22 7/8] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()

2023-05-03 Thread Eric DeVolder
The function crash_prepare_elf64_headers() generates the elfcorehdr
which describes the CPUs and memory in the system for the crash kernel.
In particular, it writes out ELF PT_NOTEs for memory regions and the
CPUs in the system.

With respect to the CPUs, the current implementation utilizes
for_each_present_cpu() which means that as CPUs are added and removed,
the elfcorehdr must again be updated to reflect the new set of CPUs.

The reasoning behind the change to use for_each_possible_cpu(), is:

- At kernel boot time, all percpu crash_notes are allocated for all
  possible CPUs; that is, crash_notes are not allocated dynamically
  when CPUs are plugged/unplugged. Thus the crash_notes for each
  possible CPU are always available.

- The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU.
  Changing to for_each_possible_cpu() is valid as the crash_notes
  pointed to by each CPU PT_NOTE are present and always valid.

Furthermore, examining a common crash processing path of:

 kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
   elfcorehdr  /proc/vmcore vmcore

reveals how the ELF CPU PT_NOTEs are utilized:

- Upon panic, each CPU is sent an IPI and shuts itself down, recording
 its state in its crash_notes. When all CPUs are shutdown, the
 crash kernel is launched with a pointer to the elfcorehdr.

- The crash kernel via linux/fs/proc/vmcore.c does not examine or
 use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.

- The makedumpfile utility uses /proc/vmcore and reads the CPU
 PT_NOTEs to craft a nr_cpus variable, which is reported in a
 header but otherwise generally unused. Makedumpfile creates the
 vmcore.

- The 'crash' dump analyzer does not appear to reference the CPU
 PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
 symbols and directly examines those structure contents from vmcore
 memory. From that information it is able to determine which CPUs
 are present and online, and locate the corresponding crash_notes.
 Said differently, it appears that 'crash' analyzer does not rely
 on the ELF PT_NOTEs for CPUs; rather it obtains the information
 directly via kernel symbols and the memory within the vmcore.

(There maybe other vmcore generating and analysis tools that do use
these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
common solution.)

This change results in the benefit of having all CPUs described in
the elfcorehdr, and therefore reducing the need to re-generate the
elfcorehdr on CPU changes, at the small expense of an additional
56 bytes per PT_NOTE for not-present-but-possible CPUs.

On systems where kexec_file_load() syscall is utilized, all the above
is valid. On systems where kexec_load() syscall is utilized, there
may be the need for the elfcorehdr to be regenerated once. The reason
being that some archs only populate the 'present' CPUs in the
/sys/devices/system/cpus entries, which the userspace 'kexec' utility
uses to generate the userspace-supplied elfcorehdr. In this situation,
one memory or CPU change will rewrite the elfcorehdr via the
crash_prepare_elf64_headers() function and now all possible CPUs will
be described, just as with kexec_file_load() syscall.

Suggested-by: Sourabh Jain 
Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
Acked-by: Baoquan He 
---
 kernel/crash_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index e05bfdb7eaed..26262789baf6 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
ehdr->e_ehsize = sizeof(Elf64_Ehdr);
ehdr->e_phentsize = sizeof(Elf64_Phdr);
 
-   /* Prepare one phdr of type PT_NOTE for each present CPU */
-   for_each_present_cpu(cpu) {
+   /* Prepare one phdr of type PT_NOTE for each possible CPU */
+   for_each_possible_cpu(cpu) {
phdr->p_type = PT_NOTE;
notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
phdr->p_offset = phdr->p_paddr = notes_addr;
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


  1   2   3   4   5   6   >