Re: [PATCH AUTOSEL 6.9 18/23] powerpc: make fadump resilient with memory add/remove events
Hello Sasha, Thank you for considering this patch for the stable tree 6.9, 6.8, 6.6, and 6.1. This patch does two things: 1. Fixes a potential memory corruption issue mentioned as the third point in the commit message 2. Enables the kernel to avoid unnecessary fadump re-registration on memory add/remove events To make the second functionality available to users, I request you also consider the upstream patch mentioned below along with this patch. Both patches were part of the same patch series. https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=bc446c5acabadeb38b61b565535401c5dfdd1214 Now, there was also a third patch in the same patch series, but that is about a documentation update. Link to patch series: https://lore.kernel.org/all/20240422195932.1583833-1-sourabhj...@linux.ibm.com/ Thanks, Sourabh Jain On 27/05/24 21:20, Sasha Levin wrote: From: Sourabh Jain [ Upstream commit c6c5b14dac0d1bd0da8b4d1d3b77f18eb9085fcb ] Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel prepares fadump header and stores it in the fadump reserved area. The fadump header includes the start address of the elfcorehdr, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr and set its address to the global variable elfcorehdr_addr for the vmcore module to process it in the second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. pt_regs_sz/cpu_mask_sz: Store size of pt_regs
Re: [PATCH 2/2] arch/powerpc: hotplug driver bridge support
slot for non-bridge/EP case */ if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI) && start->child == dn) { slotno = PCI_SLOT(PCI_DN(dn)->devfn); pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); of_node_put(dn); return; } /* Call of pci_scan_slot for bridge port */ if ((class >> 8) == PCI_CLASS_BRIDGE_PCI) { slotno = PCI_SLOT(PCI_DN(dn)->devfn); pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); } } +} +EXPORT_SYMBOL_GPL(pci_traverse_sibling_nodes_and_scan_slot); What is the need for exporting the above function? + DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_dev_pdn_setup); - Sourabh Jain
[PATCH v2 2/2] powerpc/kexec_file: fix cpus node update to FDT
While updating the cpus node, commit 40c753993e3a ("powerpc/kexec_file: Use current CPU info while setting up FDT") first deletes all subnodes under the /cpus node. However, while adding sub-nodes back, it missed adding cpus subnodes whose device_type != "cpu", such as l2-cache*, l3-cache*, ibm,powerpc-cpu-features. Fix this by only deleting cpus sub-nodes of device_type == "cpus" and then adding all available nodes with device_type == "cpu". Fixes: 40c753993e3a ("powerpc/kexec_file: Use current CPU info while setting up FDT") Cc: Aditya Gupta Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: "Naveen N. Rao" Signed-off-by: Sourabh Jain --- * No changes in v2. --- arch/powerpc/kexec/core_64.c | 53 +--- 1 file changed, 37 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 85050be08a23..2e625c2cb6b9 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -456,9 +456,15 @@ static int add_node_props(void *fdt, int node_offset, const struct device_node * * @fdt: Flattened device tree of the kernel. * * Returns 0 on success, negative errno on error. + * + * Note: expecting no subnodes under /cpus/ with device_type == "cpu". + * If this changes, update this function to include them. */ int update_cpus_node(void *fdt) { + int prev_node_offset; + const char *device_type; + const struct fdt_property *prop; struct device_node *cpus_node, *dn; int cpus_offset, cpus_subnode_offset, ret = 0; @@ -469,30 +475,44 @@ int update_cpus_node(void *fdt) return cpus_offset; } - if (cpus_offset > 0) { - ret = fdt_del_node(fdt, cpus_offset); + prev_node_offset = cpus_offset; + /* Delete sub-nodes of /cpus node with device_type == "cpu" */ + for (cpus_subnode_offset = fdt_first_subnode(fdt, cpus_offset); cpus_subnode_offset >= 0;) { + /* Ignore nodes that do not have a device_type property or device_type != "cpu" */ + prop = fdt_get_property(fdt, cpus_subnode_offset, "device_type", NULL); + if (!prop || strcmp(prop->data, "cpu")) { + prev_node_offset = cpus_subnode_offset; + goto next_node; + } + + ret = fdt_del_node(fdt, cpus_subnode_offset); if (ret < 0) { - pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret)); - return -EINVAL; + pr_err("Failed to delete a cpus sub-node: %s\n", fdt_strerror(ret)); + return ret; } +next_node: + if (prev_node_offset == cpus_offset) + cpus_subnode_offset = fdt_first_subnode(fdt, cpus_offset); + else + cpus_subnode_offset = fdt_next_subnode(fdt, prev_node_offset); } - /* Add cpus node to fdt */ - cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus"); - if (cpus_offset < 0) { - pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset)); + cpus_node = of_find_node_by_path("/cpus"); + /* Fail here to avoid kexec/kdump kernel boot hung */ + if (!cpus_node) { + pr_err("No /cpus node found\n"); return -EINVAL; } - /* Add cpus node properties */ - cpus_node = of_find_node_by_path("/cpus"); - ret = add_node_props(fdt, cpus_offset, cpus_node); - of_node_put(cpus_node); - if (ret < 0) - return ret; + /* Add all /cpus sub-nodes of device_type == "cpu" to FDT */ + for_each_child_of_node(cpus_node, dn) { + /* Ignore device nodes that do not have a device_type property +* or device_type != "cpu". +*/ + device_type = of_get_property(dn, "device_type", NULL); + if (!device_type || strcmp(device_type, "cpu")) + continue; - /* Loop through all subnodes of cpus and add them to fdt */ - for_each_node_by_type(dn, "cpu") { cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name); if (cpus_subnode_offset < 0) { pr_err("Unable to add %s subnode: %s\n", dn->full_name, @@ -506,6 +526,7 @@ int update_cpus_node(void *fdt) goto out; } out: + of_node_put(cpus_node); of_node_put(dn); return ret; } -- 2.44.0
[PATCH v2 1/2] powerpc/kexec_file: fix extra size calculation for kexec FDT
While setting up the FDT for kexec, CPU nodes that are added after the system boots and reserved memory ranges are incorporated into the initial_boot_params (base FDT). However, they are not taken into account when determining the additional size needed for the kexec FDT. As a result, kexec fails to load, generating the following error: [1116.774451] Error updating memory reserve map: FDT_ERR_NOSPACE kexec_file_load failed: No such process Therefore, consider the extra size for CPU nodes added post-system boot and reserved memory ranges while preparing the kexec FDT. While adding a new parameter to the setup_new_fdt_ppc64 function, it was noticed that there were a couple of unused parameters, so they were removed. Cc: Aditya Gupta Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: "Naveen N. Rao" Signed-off-by: Sourabh Jain --- Changelog: Since v1: - Initialize local variable `cpu_nodes` before using it. 01/02 --- arch/powerpc/include/asm/kexec.h | 6 ++-- arch/powerpc/kexec/elf_64.c | 12 +-- arch/powerpc/kexec/file_load_64.c | 53 +-- 3 files changed, 33 insertions(+), 38 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 95a98b390d62..270ee93a0f7d 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -103,10 +103,8 @@ int load_crashdump_segments_ppc64(struct kimage *image, int setup_purgatory_ppc64(struct kimage *image, const void *slave_code, const void *fdt, unsigned long kernel_load_addr, unsigned long fdt_load_addr); -unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image); -int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, - unsigned long initrd_load_addr, - unsigned long initrd_len, const char *cmdline); +unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image, struct crash_mem *rmem); +int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem *rmem); #endif /* CONFIG_PPC64 */ #endif /* CONFIG_KEXEC_FILE */ diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c index 214c071c58ed..5d6d616404cf 100644 --- a/arch/powerpc/kexec/elf_64.c +++ b/arch/powerpc/kexec/elf_64.c @@ -23,6 +23,7 @@ #include #include #include +#include static void *elf64_load(struct kimage *image, char *kernel_buf, unsigned long kernel_len, char *initrd, @@ -36,6 +37,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, const void *slave_code; struct elfhdr ehdr; char *modified_cmdline = NULL; + struct crash_mem *rmem = NULL; struct kexec_elf_info elf_info; struct kexec_buf kbuf = { .image = image, .buf_min = 0, .buf_max = ppc64_rma_size }; @@ -102,17 +104,20 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, kexec_dprintk("Loaded initrd at 0x%lx\n", initrd_load_addr); } + ret = get_reserved_memory_ranges(); + if (ret) + goto out; + fdt = of_kexec_alloc_and_setup_fdt(image, initrd_load_addr, initrd_len, cmdline, - kexec_extra_fdt_size_ppc64(image)); + kexec_extra_fdt_size_ppc64(image, rmem)); if (!fdt) { pr_err("Error setting up the new device tree.\n"); ret = -EINVAL; goto out; } - ret = setup_new_fdt_ppc64(image, fdt, initrd_load_addr, - initrd_len, cmdline); + ret = setup_new_fdt_ppc64(image, fdt, rmem); if (ret) goto out_free_fdt; @@ -146,6 +151,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, out_free_fdt: kvfree(fdt); out: + kfree(rmem); kfree(modified_cmdline); kexec_free_elf_info(_info); diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 925a69ad2468..413c76de283d 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -803,10 +803,9 @@ static unsigned int cpu_node_size(void) return size; } -static unsigned int kdump_extra_fdt_size_ppc64(struct kimage *image) +static unsigned int kdump_extra_fdt_size_ppc64(struct kimage *image, unsigned int cpu_nodes) { - unsigned int cpu_nodes, extra_size = 0; - struct device_node *dn; + unsigned int extra_size = 0; u64 usm_entries; #ifdef CONFIG_CRASH_HOTPLUG unsigned int possible_cpu_nodes; @@ -826,18 +825,6 @@ static unsigned int kdump_extra_fdt_size_ppc64(struct kimage *image) extra_size += (unsigned int)(usm_entries * sizeof(u64)); } - /* -* Get the number
[PATCH v2 0/2] powerpc: kexec fixes
Patch series fixes two kexec issues. 01/02: Update extra size calculation for kexec FDT to avoid kexec load failure due to FDT_ERR_NOSPACE while including CPU nodes added post boot and reserved memory ranges. 02/02: Fix update_cpus_node/core_64.c function to include missing device nodes under /cpus node with device_type != "cpu". Note: this patch series is rebased on top of the linux-next/master, tag: next-20240509 to avoid the conflict with the below patch series: https://lore.kernel.org/all/171509287314.62008.11812494124513471250.b4...@ellerman.id.au/ Changelog: == v2: - Initialize local variable `cpu_nodes` before using it. 01/02 - Rebased on top of linux-next/mater tag: next-20240509. v1: - https://lore.kernel.org/all/20240508130558.1939304-1-sourabhj...@linux.ibm.com/ Cc: Aditya Gupta Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: "Naveen N. Rao" Sourabh Jain (2): powerpc/kexec_file: fix extra size calculation for kexec FDT powerpc/kexec_file: fix cpus node update to FDT arch/powerpc/include/asm/kexec.h | 6 ++-- arch/powerpc/kexec/core_64.c | 53 +-- arch/powerpc/kexec/elf_64.c | 12 +-- arch/powerpc/kexec/file_load_64.c | 53 +-- 4 files changed, 70 insertions(+), 54 deletions(-) -- 2.44.0
[PATCH 2/2] powerpc/kexec_file: fix cpus node update to FDT
While updating the cpus node, commit 40c753993e3a ("powerpc/kexec_file: Use current CPU info while setting up FDT") first deletes all subnodes under the /cpus node. However, while adding sub-nodes back, it missed adding cpus subnodes whose device_type != "cpu", such as l2-cache*, l3-cache*, ibm,powerpc-cpu-features. Fix this by only deleting cpus sub-nodes of device_type == "cpus" and then adding all available nodes with device_type == "cpu". Fixes: 40c753993e3a ("powerpc/kexec_file: Use current CPU info while setting up FDT") Cc: Aditya Gupta Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: "Naveen N. Rao" Signed-off-by: Sourabh Jain --- arch/powerpc/kexec/core_64.c | 53 +--- 1 file changed, 37 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 85050be08a23..2e625c2cb6b9 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -456,9 +456,15 @@ static int add_node_props(void *fdt, int node_offset, const struct device_node * * @fdt: Flattened device tree of the kernel. * * Returns 0 on success, negative errno on error. + * + * Note: expecting no subnodes under /cpus/ with device_type == "cpu". + * If this changes, update this function to include them. */ int update_cpus_node(void *fdt) { + int prev_node_offset; + const char *device_type; + const struct fdt_property *prop; struct device_node *cpus_node, *dn; int cpus_offset, cpus_subnode_offset, ret = 0; @@ -469,30 +475,44 @@ int update_cpus_node(void *fdt) return cpus_offset; } - if (cpus_offset > 0) { - ret = fdt_del_node(fdt, cpus_offset); + prev_node_offset = cpus_offset; + /* Delete sub-nodes of /cpus node with device_type == "cpu" */ + for (cpus_subnode_offset = fdt_first_subnode(fdt, cpus_offset); cpus_subnode_offset >= 0;) { + /* Ignore nodes that do not have a device_type property or device_type != "cpu" */ + prop = fdt_get_property(fdt, cpus_subnode_offset, "device_type", NULL); + if (!prop || strcmp(prop->data, "cpu")) { + prev_node_offset = cpus_subnode_offset; + goto next_node; + } + + ret = fdt_del_node(fdt, cpus_subnode_offset); if (ret < 0) { - pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret)); - return -EINVAL; + pr_err("Failed to delete a cpus sub-node: %s\n", fdt_strerror(ret)); + return ret; } +next_node: + if (prev_node_offset == cpus_offset) + cpus_subnode_offset = fdt_first_subnode(fdt, cpus_offset); + else + cpus_subnode_offset = fdt_next_subnode(fdt, prev_node_offset); } - /* Add cpus node to fdt */ - cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus"); - if (cpus_offset < 0) { - pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset)); + cpus_node = of_find_node_by_path("/cpus"); + /* Fail here to avoid kexec/kdump kernel boot hung */ + if (!cpus_node) { + pr_err("No /cpus node found\n"); return -EINVAL; } - /* Add cpus node properties */ - cpus_node = of_find_node_by_path("/cpus"); - ret = add_node_props(fdt, cpus_offset, cpus_node); - of_node_put(cpus_node); - if (ret < 0) - return ret; + /* Add all /cpus sub-nodes of device_type == "cpu" to FDT */ + for_each_child_of_node(cpus_node, dn) { + /* Ignore device nodes that do not have a device_type property +* or device_type != "cpu". +*/ + device_type = of_get_property(dn, "device_type", NULL); + if (!device_type || strcmp(device_type, "cpu")) + continue; - /* Loop through all subnodes of cpus and add them to fdt */ - for_each_node_by_type(dn, "cpu") { cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name); if (cpus_subnode_offset < 0) { pr_err("Unable to add %s subnode: %s\n", dn->full_name, @@ -506,6 +526,7 @@ int update_cpus_node(void *fdt) goto out; } out: + of_node_put(cpus_node); of_node_put(dn); return ret; } -- 2.44.0
[PATCH 0/2] powerpc: kexec fixes
Patch series fixes two kexec issues. 01/02: Update extra size calculation for kexec FDT to avoid kexec load failure due to FDT_ERR_NOSPACE while including CPU nodes added post boot and reserved memory ranges. 02/02: Fix update_cpus_node/core_64.c function to include missing device nodes under /cpus node with device_type != "cpu". Note: this patch series is rebased on top of the linux-next/master to avoid the conflict with the below patch series: https://lore.kernel.org/all/171509287314.62008.11812494124513471250.b4...@ellerman.id.au/ Cc: Aditya Gupta Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: "Naveen N. Rao" Sourabh Jain (2): powerpc/kexec_file: fix extra size calculation for kexec FDT powerpc/kexec_file: fix cpus node update to FDT arch/powerpc/include/asm/kexec.h | 6 ++-- arch/powerpc/kexec/core_64.c | 53 +-- arch/powerpc/kexec/elf_64.c | 12 +-- arch/powerpc/kexec/file_load_64.c | 53 +-- 4 files changed, 70 insertions(+), 54 deletions(-) -- 2.44.0
[PATCH 1/2] powerpc/kexec_file: fix extra size calculation for kexec FDT
While setting up the FDT for kexec, CPU nodes that are added after the system boots and reserved memory ranges are incorporated into the initial_boot_params (base FDT). However, they are not taken into account when determining the additional size needed for the kexec FDT. As a result, kexec fails to load, generating the following error: [1116.774451] Error updating memory reserve map: FDT_ERR_NOSPACE kexec_file_load failed: No such process Therefore, consider the extra size for CPU nodes added post-system boot and reserved memory ranges while preparing the kexec FDT. While adding a new parameter to the setup_new_fdt_ppc64 function, it was noticed that there were a couple of unused parameters, so they were removed. Cc: Aditya Gupta Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: "Naveen N. Rao" Signed-off-by: Sourabh Jain --- arch/powerpc/include/asm/kexec.h | 6 ++-- arch/powerpc/kexec/elf_64.c | 12 +-- arch/powerpc/kexec/file_load_64.c | 53 +-- 3 files changed, 33 insertions(+), 38 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 95a98b390d62..270ee93a0f7d 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -103,10 +103,8 @@ int load_crashdump_segments_ppc64(struct kimage *image, int setup_purgatory_ppc64(struct kimage *image, const void *slave_code, const void *fdt, unsigned long kernel_load_addr, unsigned long fdt_load_addr); -unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image); -int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, - unsigned long initrd_load_addr, - unsigned long initrd_len, const char *cmdline); +unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image, struct crash_mem *rmem); +int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem *rmem); #endif /* CONFIG_PPC64 */ #endif /* CONFIG_KEXEC_FILE */ diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c index 214c071c58ed..5d6d616404cf 100644 --- a/arch/powerpc/kexec/elf_64.c +++ b/arch/powerpc/kexec/elf_64.c @@ -23,6 +23,7 @@ #include #include #include +#include static void *elf64_load(struct kimage *image, char *kernel_buf, unsigned long kernel_len, char *initrd, @@ -36,6 +37,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, const void *slave_code; struct elfhdr ehdr; char *modified_cmdline = NULL; + struct crash_mem *rmem = NULL; struct kexec_elf_info elf_info; struct kexec_buf kbuf = { .image = image, .buf_min = 0, .buf_max = ppc64_rma_size }; @@ -102,17 +104,20 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, kexec_dprintk("Loaded initrd at 0x%lx\n", initrd_load_addr); } + ret = get_reserved_memory_ranges(); + if (ret) + goto out; + fdt = of_kexec_alloc_and_setup_fdt(image, initrd_load_addr, initrd_len, cmdline, - kexec_extra_fdt_size_ppc64(image)); + kexec_extra_fdt_size_ppc64(image, rmem)); if (!fdt) { pr_err("Error setting up the new device tree.\n"); ret = -EINVAL; goto out; } - ret = setup_new_fdt_ppc64(image, fdt, initrd_load_addr, - initrd_len, cmdline); + ret = setup_new_fdt_ppc64(image, fdt, rmem); if (ret) goto out_free_fdt; @@ -146,6 +151,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, out_free_fdt: kvfree(fdt); out: + kfree(rmem); kfree(modified_cmdline); kexec_free_elf_info(_info); diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 925a69ad2468..41be4546a34e 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -803,10 +803,9 @@ static unsigned int cpu_node_size(void) return size; } -static unsigned int kdump_extra_fdt_size_ppc64(struct kimage *image) +static unsigned int kdump_extra_fdt_size_ppc64(struct kimage *image, unsigned int cpu_nodes) { - unsigned int cpu_nodes, extra_size = 0; - struct device_node *dn; + unsigned int extra_size = 0; u64 usm_entries; #ifdef CONFIG_CRASH_HOTPLUG unsigned int possible_cpu_nodes; @@ -826,18 +825,6 @@ static unsigned int kdump_extra_fdt_size_ppc64(struct kimage *image) extra_size += (unsigned int)(usm_entries * sizeof(u64)); } - /* -* Get the number of CPU nodes in the current DT. This allows to -* reserve places for CPU n
[PATCH] powerpc/crash: remove unnecessary NULL check before kvfree()
Fix the following coccicheck build warning: arch/powerpc/kexec/crash.c:488:2-8: WARNING: NULL check before some freeing functions is not needed. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202404261048.skfv5ddb-...@intel.com/ Cc: Michael Ellerman Cc: Stephen Rothwell Signed-off-by: Sourabh Jain --- arch/powerpc/kexec/crash.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c index 21b193e938a3..9ac3266e4965 100644 --- a/arch/powerpc/kexec/crash.c +++ b/arch/powerpc/kexec/crash.c @@ -484,8 +484,7 @@ static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify * } out: kvfree(cmem); - if (elfbuf) - kvfree(elfbuf); + kvfree(elfbuf); } /** -- 2.44.0
[PATCH v19 6/6] powerpc/crash: add crash memory hotplug support
Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot remove, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the kexec tool. Passing this flag to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- Changes in v19: * Fix a build warning: remove NULL check before freeing memory for elfbuf in update_crash_elfcorehdr function. arch/powerpc/include/asm/kexec.h| 3 + arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/crash.c | 94 - arch/powerpc/kexec/file_load_64.c | 20 +- arch/powerpc/kexec/ranges.c | 85 ++ 5 files changed, 201 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index e75970351bcd..95a98b390d62 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -141,6 +141,9 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); #define arch_crash_hotplug_support arch_crash_hotplug_support + +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size #endif /* CONFIG_CRASH_HOTPLUG */ extern int crashing_cpu; diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index 8489e844b447..14055896cbcb 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int get_exclude_memory_ranges(struct crash_mem **mem_ranges); int get_reserved_memory_ranges(struct crash_mem **mem_ranges); int get_crash_memory_ranges(struct crash_mem **mem_ranges); diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c index 8938a19af12f..9ac3266e4965 100644 --- a/arch/powerpc/kexec/crash.c +++ b/arch/powerpc/kexec/crash.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -25,6 +26,7 @@ #include #include #include +#include /* * The primary CPU waits a while for all secondary CPUs to enter. This is to @@ -398,6 +400,93 @@ void default_machine_crash_shutdown(struct pt_regs *regs) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt +/* + * Advertise preferred elfcorehdr size to userspace via + * /sys/kernel/c
[PATCH v19 5/6] powerpc/crash: add crash CPU hotplug support
Due to CPU/Memory hotplug or online/offline events, the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward, CPU hotplug or online/offline events are referred as CPU/Memory add/remove events. The current solution to address the above issue involves monitoring the CPU/Memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. To address the aforementioned issue, commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. In the event of a CPU or memory add/remove events, the generic crash hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It then acquires the necessary locks to update the kdump image and invokes the architecture-specific crash hotplug handler, `arch_crash_handle_hotplug_event()`, to update the required kdump image components. This patch adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU add/remove events. Support for memory add/remove events is added in a subsequent patch with the title "powerpc: add crash memory hotplug support" As mentioned earlier, only the elfcorehdr and FDT kdump image components need to be updated in the event of CPU or memory add/remove events. However, on PowerPC architecture crash hotplug handler only updates the FDT to enable crash hotplug support for CPU add/remove events. Here's why. The elfcorehdr on PowerPC is built with possible CPUs, and thus, it does not need an update on CPU add/remove events. On the other hand, the FDT needs to be updated on CPU add events to include the newly added CPU. If the FDT is not updated and the kernel crashes on a newly added CPU, the kdump kernel will fail to boot due to the unavailability of the crashing CPU in the FDT. During the early boot, it is expected that the boot CPU must be a part of the FDT; otherwise, the kernel will raise a BUG and fail to boot. For more information, refer to commit 36ae37e3436b0 ("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is okay to have an offline CPU in the kdump FDT, no action is taken in case of CPU removal. There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. Few changes have been made to ensure kernel can safely update the FDT of kdump image loaded using both system calls. For kexec_file_load syscall the kdump image is prepared in kernel. So to support an increasing number of CPUs, the FDT is constructed with extra buffer space to ensure it can accommodate a possible number of CPU nodes. Additionally, a call to fdt_pack (which trims the unused space once the FDT is prepared) is avoided if this feature is enabled. For the kexec_load syscall, the FDT is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by userspace (kexec tools). When userspace passes this flag to the kernel, it indicates that the FDT is built to accommodate possible CPUs, and the FDT segment is excluded from SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v19. arch/powerpc/Kconfig | 4 ++ arch/powerpc/include/asm/kexec.h | 8 +++ arch/powerpc/kexec/crash.c| 103 ++ arch/powerpc/kexec/elf_64.c | 3 +- arch/powerpc/kexec/file_load_64.c | 17 + 5 files changed, 134 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1c4be3373686..a1a3b3363008 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -686,6 +686,10 @@ config ARCH_SELECTS_CRASH_DUMP depend
[PATCH v19 3/6] powerpc/kexec: move *_memory_ranges functions to ranges.c
Move the following functions form kexec/{file_load_64.c => ranges.c} and make them public so that components other than KEXEC_FILE can also use these functions. 1. get_exclude_memory_ranges 2. get_reserved_memory_ranges 3. get_crash_memory_ranges 4. get_usable_memory_ranges Later in the series get_crash_memory_ranges function is utilized for in-kernel updates to kdump image during CPU/Memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. Since the above functions are moved to ranges.c, some of the helper functions in ranges.c are no longer required to be public. Mark them as static and removed them from kexec_ranges.h header file. Finally, remove the CONFIG_KEXEC_FILE build dependency for range.c because it is required for other config, such as CONFIG_CRASH_DUMP. No functional changes are intended. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v19. arch/powerpc/include/asm/kexec_ranges.h | 19 +- arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/file_load_64.c | 190 arch/powerpc/kexec/ranges.c | 227 +++- 4 files changed, 224 insertions(+), 216 deletions(-) diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index f83866a19e87..8489e844b447 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,19 +7,8 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); -int add_tce_mem_ranges(struct crash_mem **mem_ranges); -int add_initrd_mem_range(struct crash_mem **mem_ranges); -#ifdef CONFIG_PPC_64S_HASH_MMU -int add_htab_mem_range(struct crash_mem **mem_ranges); -#else -static inline int add_htab_mem_range(struct crash_mem **mem_ranges) -{ - return 0; -} -#endif -int add_kernel_mem_range(struct crash_mem **mem_ranges); -int add_rtas_mem_range(struct crash_mem **mem_ranges); -int add_opal_mem_range(struct crash_mem **mem_ranges); -int add_reserved_mem_ranges(struct crash_mem **mem_ranges); - +int get_exclude_memory_ranges(struct crash_mem **mem_ranges); +int get_reserved_memory_ranges(struct crash_mem **mem_ranges); +int get_crash_memory_ranges(struct crash_mem **mem_ranges); +int get_usable_memory_ranges(struct crash_mem **mem_ranges); #endif /* _ASM_POWERPC_KEXEC_RANGES_H */ diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile index 8e469c4da3f8..470eb0453e17 100644 --- a/arch/powerpc/kexec/Makefile +++ b/arch/powerpc/kexec/Makefile @@ -3,11 +3,11 @@ # Makefile for the linux kernel. # -obj-y += core.o core_$(BITS).o +obj-y += core.o core_$(BITS).o ranges.o obj-$(CONFIG_PPC32)+= relocate_32.o -obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o +obj-$(CONFIG_KEXEC_FILE) += file_load.o file_load_$(BITS).o elf_$(BITS).o obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o obj-$(CONFIG_CRASH_DUMP) += crash.o diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 1bc65de6174f..6a01f62b8fcf 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -47,83 +47,6 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { NULL }; -/** - * get_exclude_memory_ranges - Get exclude memory ranges. This list includes - * regions like opal/rtas, tce-table, initrd, - * kernel, htab which should be avoided while - * setting up kexec load segments. - * @mem_ranges:Range list to add the memory ranges to. - * - * Returns 0 on success, negative errno on error. - */ -static int get_exclude_memory_ranges(struct crash_mem **mem_ranges) -{ - int ret; - - ret = add_tce_mem_ranges(mem_ranges); - if (ret) - goto out; - - ret = add_initrd_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_htab_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_kernel_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_rtas_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_opal_mem_range(mem_ran
[PATCH v19 4/6] PowerPC/kexec: make the update_cpus_node() function public
Move the update_cpus_node() from kexec/{file_load_64.c => core_64.c} to allow other kexec components to use it. Later in the series, this function is used for in-kernel updates to the kdump image during CPU/memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. No functional changes are intended. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v19. arch/powerpc/include/asm/kexec.h | 4 ++ arch/powerpc/kexec/core_64.c | 91 +++ arch/powerpc/kexec/file_load_64.c | 87 - 3 files changed, 95 insertions(+), 87 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index fdb90e24dc74..d9ff4d0e392d 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -185,6 +185,10 @@ static inline void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) #endif /* CONFIG_CRASH_DUMP */ +#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP) +int update_cpus_node(void *fdt); +#endif + #ifdef CONFIG_PPC_BOOK3S_64 #include #endif diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 762e4d09aacf..85050be08a23 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -30,6 +31,7 @@ #include #include #include +#include int machine_kexec_prepare(struct kimage *image) { @@ -419,3 +421,92 @@ static int __init export_htab_values(void) } late_initcall(export_htab_values); #endif /* CONFIG_PPC_64S_HASH_MMU */ + +#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP) +/** + * add_node_props - Reads node properties from device node structure and add + * them to fdt. + * @fdt:Flattened device tree of the kernel + * @node_offset:offset of the node to add a property at + * @dn: device node pointer + * + * Returns 0 on success, negative errno on error. + */ +static int add_node_props(void *fdt, int node_offset, const struct device_node *dn) +{ + int ret = 0; + struct property *pp; + + if (!dn) + return -EINVAL; + + for_each_property_of_node(dn, pp) { + ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length); + if (ret < 0) { + pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret)); + return ret; + } + } + return ret; +} + +/** + * update_cpus_node - Update cpus node of flattened device tree using of_root + *device node. + * @fdt: Flattened device tree of the kernel. + * + * Returns 0 on success, negative errno on error. + */ +int update_cpus_node(void *fdt) +{ + struct device_node *cpus_node, *dn; + int cpus_offset, cpus_subnode_offset, ret = 0; + + cpus_offset = fdt_path_offset(fdt, "/cpus"); + if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) { + pr_err("Malformed device tree: error reading /cpus node: %s\n", + fdt_strerror(cpus_offset)); + return cpus_offset; + } + + if (cpus_offset > 0) { + ret = fdt_del_node(fdt, cpus_offset); + if (ret < 0) { + pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret)); + return -EINVAL; + } + } + + /* Add cpus node to fdt */ + cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus"); + if (cpus_offset < 0) { + pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset)); + return -EINVAL; + } + + /* Add cpus node properties */ + cpus_node = of_find_node_by_path("/cpus"); + ret = add_node_props(fdt, cpus_offset, cpus_node); + of_node_put(cpus_node); + if (ret < 0) + return ret; + + /* Loop through all subnodes of cpus and add them to fdt */ + for_each_node_by_type(dn, "cpu") { + cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name); + if (cpus_subnode_offset < 0) { + pr_err("Unable to add %s subno
[PATCH v19 2/6] crash: add a new kexec flag for hotplug support
Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now we have two kexec flags that enables crash hotplug support for kexec_load system call. First is KEXEC_UPDATE_ELFCOREHDR (only used in x86), and second is KEXEC_CRASH_HOTPLUG_SUPPORT (for all architectures). To simplify the process of finding and reporting the crash hotplug support the following changes are introduced. 1. Define arch specific function to process the kexec flags and determine crash hotplug support 2. Rename the @update_elfcorehdr member of struct kimage to @hotplug_support and populate it for both kexec_load and kexec_file_load syscalls, because architecture can update more than one kexec segment 3. Let generic function crash_check_hotplug_support report hotplug support for loaded kdump image based on value of @hotplug_support To bring the x86 crash hotplug support in line with the above points, the following changes have been made: - Introduce the arch_crash_hotplug_support function to process kexec flags and determine crash hotplug support - Remove the arch_crash_hotplug_[cpu|memory]_support functions Signed-off-by: Sourabh Jain Acked-by: Baoquan He Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v19. arch/x86/include/asm/kexec.h | 11 ++- arch/x86/kernel/crash.c | 28 +--- drivers/base/cpu.c | 2 +- drivers/base/memory.c| 2 +- include/linux/crash_core.h | 13 ++--- include/linux/kexec.h| 11 +++ include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 15 ++- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 10 files changed, 48 insertions(+), 44 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index cb1320ebbc23..ae5482a2f0ca 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -210,15 +210,8 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support -#endif - -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support -#endif +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 2a682fe86352..f06501445cd9 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -402,20 +402,26 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) -{ - return crash_check_update_elfcorehdr(); -} +#ifdef CONFIG_KEXEC_FILE + if (image->file_mode) + return 1; #endif + /* +* Initially, crash hotplug support for kexec_load was added +* with the KEXEC_UPDATE_ELFCOREHDR flag. Later, this +* functionality was expanded to accommodate multiple kexec +* segment updates, leading to the introduction of the +* KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag bit. Consequently, +* when the
[PATCH v19 0/6]powerpc/crash: Kernel handling of CPU and memory hotplug
er files - Rebase to v6.7-rc5 v13: - Fix a build warning, take ranges.c out of CONFIG_KEXEC_FILE - Rebase to v6.7-rc4 v12: - A patch to add new kexec flags to support this feature on kexec_load system call - Change in the way this feature is advertise to userspace for both kexec_load syscall - Rebase to v6.6-rc7 v11: - Rebase to v6.4-rc6 - The patch that introduced CONFIG_CRASH_HOTPLUG for PowerPC has been removed. The config is now part of common configuration: https://lore.kernel.org/all/87ilbpflsk.fsf@mail.lhotse/ v10: - Drop the patch that adds fdt_index attribute to struct kimage_arch Find the fdt segment index when needed. - Added more details into commits messages. - Rebased onto 6.3.0-rc5 v9: - Removed patch to prepare elfcorehdr crash notes for possible CPUs. The patch is moved to generic patch series that introduces generic infrastructure for in kernel crash update. - Removed patch to pass the hotplug action type to the arch crash hotplug handler function. The generic patch series has introduced the hotplug action type in kimage struct. - Add detail commit message for better understanding. v8: - Restrict fdt_index initialization to machine_kexec_post_load it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour - Updated the logic to find the number of offline core. [6/8] - Changed the logic to find the elfcore program header to accommodate future memory ranges due memory hotplug events. [8/8] v7 - added a new config to configure this feature - pass hotplug action type to arch specific handler v6 - Added crash memory hotplug support v5: - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU. - Move fdt segment identification for kexec_load case to load path instead of crash hotplug handler - Keep new attribute defined under kimage_arch to track FDT segment under CONFIG_HOTPLUG_CPU config. v4: - Update the logic to find the additional space needed for hotadd CPUs post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load" patch to know more about the change. - Fix a couple of typo. - Replace pr_err to pr_info_once to warn user about memory hotplug support. - In crash hotplug handle exit the for loop if FDT segment is found. v3 - Move fdt_index and fdt_index_vaild variables to kimage_arch struct. - Rebase patche on top of https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/ - Fixed warning reported by checpatch script v2: - Use generic hotplug handler introduced by https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/ a significant change from v1. Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org Sourabh Jain (6): crash: forward memory_notify arg to arch crash hotplug handler crash: add a new kexec flag for hotplug support powerpc/kexec: move *_memory_ranges functions to ranges.c PowerPC/kexec: make the update_cpus_node() function public powerpc/crash: add crash CPU hotplug support powerpc/crash: add crash memory hotplug support arch/powerpc/Kconfig| 4 + arch/powerpc/include/asm/kexec.h| 15 ++ arch/powerpc/include/asm/kexec_ranges.h | 20 +- arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/core_64.c| 91 +++ arch/powerpc/kexec/crash.c | 195 +++ arch/powerpc/kexec/elf_64.c | 3 +- arch/powerpc/kexec/file_load_64.c | 314 +++- arch/powerpc/kexec/ranges.c | 312 ++- arch/x86/include/asm/kexec.h| 13 +- arch/x86/kernel/crash.c | 32 ++- drivers/base/cpu.c | 2 +- drivers/base/memory.c | 2 +- include/linux/crash_core.h | 15 +- include/linux/kexec.h | 11 +- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 29 +-- kernel/kexec.c | 4 +- kernel/kexec_file.c | 5 + 19 files changed, 713 insertions(+), 359 deletions(-) -- 2.44.0
[PATCH v19 1/6] crash: forward memory_notify arg to arch crash hotplug handler
In the event of memory hotplug or online/offline events, the crash memory hotplug notifier `crash_memhp_notifier()` receives a `memory_notify` object but doesn't forward that object to the generic and architecture-specific crash hotplug handler. The `memory_notify` object contains the starting PFN (Page Frame Number) and the number of pages in the hot-removed memory. This information is necessary for architectures like PowerPC to update/recreate the kdump image, specifically `elfcorehdr`. So update the function signature of `crash_handle_hotplug_event()` and `arch_crash_handle_hotplug_event()` to accept the `memory_notify` object as an argument from crash memory hotplug notifier. Since no such object is available in the case of CPU hotplug event, the crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the crash hotplug handler. Signed-off-by: Sourabh Jain Acked-by: Baoquan He Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v19. arch/x86/include/asm/kexec.h | 2 +- arch/x86/kernel/crash.c | 4 +++- include/linux/crash_core.h | 2 +- kernel/crash_core.c | 14 +++--- 4 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 91ca9a9ee3a2..cb1320ebbc23 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -207,7 +207,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image); extern void kdump_nmi_shootdown_cpus(void); #ifdef CONFIG_CRASH_HOTPLUG -void arch_crash_handle_hotplug_event(struct kimage *image); +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index e74d0c4286c1..2a682fe86352 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -432,10 +432,12 @@ unsigned int arch_crash_get_elfcorehdr_size(void) /** * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes * @image: a pointer to kexec_crash_image + * @arg: struct memory_notify handler for memory hotplug case and + * NULL for CPU hotplug case. * * Prepare the new elfcorehdr and replace the existing elfcorehdr. */ -void arch_crash_handle_hotplug_event(struct kimage *image) +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { void *elfbuf = NULL, *old_elfcorehdr; unsigned long nr_mem_ranges; diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index d33352c2e386..647e928efee8 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -37,7 +37,7 @@ static inline void arch_kexec_unprotect_crashkres(void) { } #ifndef arch_crash_handle_hotplug_event -static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } +static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } #endif int crash_check_update_elfcorehdr(void); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 78b5dc7cee3a..70fa8111a9d6 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -534,7 +534,7 @@ int crash_check_update_elfcorehdr(void) * list of segments it checks (since the elfcorehdr changes and thus * would require an update to purgatory itself to update the digest). */ -static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) +static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg) { struct kimage *image; @@ -596,7 +596,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) image->hp_action = hp_action; /* Now invoke arch-specific update handler */ - arch_crash_handle_hotplug_event(image); + arch_crash_handle_hotplug_event(image, arg); /* No longer handling a hotplug event */ image->hp_action = KEXEC_CRASH_HP_NONE; @@ -612,17 +612,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) crash_hotplug_unlock(); } -static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v) +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg) { switch (val) { case MEM_ONLINE: crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, - KEXEC_CRASH_HP_INVALID_CPU); + KEXEC_CRASH_HP_INVALID_CP
[PATCH v10 3/3] Documentation/powerpc: update fadump implementation details
The patch titled ("powerpc: make fadump resilient with memory add/remove events") has made significant changes to the implementation of fadump, particularly on elfcorehdr creation and fadump crash info header structure. Therefore, updating the fadump implementation documentation to reflect those changes. Following updates are done to firmware assisted dump documentation: 1. The elfcorehdr is no longer stored after fadump HDR in the reserved dump area. Instead, the second kernel dynamically allocates memory for the elfcorehdr within the address range from 0 to the boot memory size. Therefore, update figures 1 and 2 of Memory Reservation during the first and second kernels to reflect this change. 2. A version field has been added to the fadump header to manage the future changes to fadump crash info header structure without changing the fadump header magic number in the future. Therefore, remove the corresponding TODO from the document. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- .../arch/powerpc/firmware-assisted-dump.rst | 91 +-- 1 file changed, 42 insertions(+), 49 deletions(-) diff --git a/Documentation/arch/powerpc/firmware-assisted-dump.rst b/Documentation/arch/powerpc/firmware-assisted-dump.rst index e363fc48529a..7e37aadd1f77 100644 --- a/Documentation/arch/powerpc/firmware-assisted-dump.rst +++ b/Documentation/arch/powerpc/firmware-assisted-dump.rst @@ -134,12 +134,12 @@ that are run. If there is dump data, then the memory is held. If there is no waiting dump data, then only the memory required to -hold CPU state, HPTE region, boot memory dump, FADump header and -elfcore header, is usually reserved at an offset greater than boot -memory size (see Fig. 1). This area is *not* released: this region -will be kept permanently reserved, so that it can act as a receptacle -for a copy of the boot memory content in addition to CPU state and -HPTE region, in the case a crash does occur. +hold CPU state, HPTE region, boot memory dump, and FADump header is +usually reserved at an offset greater than boot memory size (see Fig. 1). +This area is *not* released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy of the boot +memory content in addition to CPU state and HPTE region, in the case +a crash does occur. Since this reserved memory area is used only after the system crash, there is no point in blocking this significant chunk of memory from @@ -153,22 +153,22 @@ that were present in CMA region:: o Memory Reservation during first kernel - Low memory Top of memory - 0boot memory size |<--- Reserved dump area --->| | - | | |Permanent Reservation | | - V V || V - +---+-/ /---+---++---+-+-++--+ - | | |///|| DUMP | HDR | ELF || | - +---+-/ /---+---++---+-+-++--+ -| ^^ ^ ^ ^ -| || | | | -\ CPU HPTE / | | - -- | | - Boot memory content gets transferred| | - to reserved area by firmware at the | | - time of crash. | | - FADump Header | - (meta area)| + Low memory Top of memory + 0boot memory size |<-- Reserved dump area ->| | + | | | Permanent Reservation | | + V V | | V + +---+-/ /---+---++---+---++-+ + | | |///||DUMP | HDR || | + +---+-/ /---+---++---+---++-+ +| ^^ ^ ^ ^ +| || | | | +\ CPU HPTE / | | + | | + Boot memory content gets transferred | | + to reserved area by firmware at the | | + time of crash. | | + FADump Header | +(meta area) | | | Metadata: This area holds a metadata structure whose @@ -186,13 +186,2
[PATCH v10 2/3] powerpc/fadump: add hotplug_ready sysfs interface
The elfcorehdr describes the CPUs and memory of the crashed kernel to the kernel that captures the dump, known as the second or fadump kernel. The elfcorehdr needs to be updated if the system's memory changes due to memory hotplug or online/offline events. Currently, memory hotplug events are monitored in userspace by udev rules, and fadump is re-registered, which recreates the elfcorehdr with the latest available memory in the system. However, the previous patch ("powerpc: make fadump resilient with memory add/remove events") moved the creation of elfcorehdr to the second or fadump kernel. This eliminates the need to regenerate the elfcorehdr during memory hotplug or online/offline events. Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let userspace know that fadump re-registration is not required for memory add/remove events. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: "Aneesh Kumar K.V" Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- Documentation/ABI/testing/sysfs-kernel-fadump | 11 +++ arch/powerpc/kernel/fadump.c | 14 ++ 2 files changed, 25 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump b/Documentation/ABI/testing/sysfs-kernel-fadump index 8f7a64a81783..c586054657d6 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump +++ b/Documentation/ABI/testing/sysfs-kernel-fadump @@ -38,3 +38,14 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Provide information about the amount of memory reserved by FADump to save the crash dump in bytes. + +What: /sys/kernel/fadump/hotplug_ready +Date: Apr 2024 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read only + Kdump udev rule re-registers fadump on memory add/remove events, + primarily to update the elfcorehdr. This sysfs indicates the + kdump udev rule that fadump re-registration is not required on + memory add/remove events because elfcorehdr is now prepared in + the second/fadump kernel. +User: kexec-tools diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 35254fc1516b..dfab452e947b 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1442,6 +1442,18 @@ static ssize_t enabled_show(struct kobject *kobj, return sprintf(buf, "%d\n", fw_dump.fadump_enabled); } +/* + * /sys/kernel/fadump/hotplug_ready sysfs node returns 1, which inidcates + * to usersapce that fadump re-registration is not required on memory + * hotplug events. + */ +static ssize_t hotplug_ready_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", 1); +} + static ssize_t mem_reserved_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -1514,11 +1526,13 @@ static struct kobj_attribute release_attr = __ATTR_WO(release_mem); static struct kobj_attribute enable_attr = __ATTR_RO(enabled); static struct kobj_attribute register_attr = __ATTR_RW(registered); static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved); +static struct kobj_attribute hotplug_ready_attr = __ATTR_RO(hotplug_ready); static struct attribute *fadump_attrs[] = { _attr.attr, _attr.attr, _reserved_attr.attr, + _ready_attr.attr, NULL, }; -- 2.44.0
[PATCH v10 1/3] powerpc: make fadump resilient with memory add/remove events
Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel prepares fadump header and stores it in the fadump reserved area. The fadump header includes the start address of the elfcorehdr, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr and set its address to the global variable elfcorehdr_addr for the vmcore module to process it in the second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. pt_regs_sz/cpu_mask_sz: Store size of pt_regs and cpu_mask structure of first kernel. These attributes are used to prevent dump processing if the sizes of pt_regs or cpu_mask structure differ between the first and fadump kernels. Note: if either first/crashed kernel or second/fadump kernel do not have the changes introduced here then kernel fail to collect the dump and prints relevant error message on the console. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: "Aneesh Kumar K.V" Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- arch/powerpc/include/asm/fadump-internal.h | 31 +- arch/powerpc/kernel/fadump.c | 361 +++ arch/powerpc/platforms/powernv/opal-fadump.c | 22 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 34 +- 4 files changed, 242 insertions(+), 206 deletions(-) diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index 27f9e11eda28..5d706a7acc8a 100644 --- a/arch/power
[PATCH v10 0/3] powerpc: make fadump resilient with memory add/remove events
ader contains an old magic number - Rebased it to 6.7.0-rc4 v5: 29 Oct 2023 https://lore.kernel.org/all/20231029124548.12198-1-sourabhj...@linux.ibm.com/ - Fix a comment on the first patch v4: 21 Oct 2023 https://lore.kernel.org/all/20231021181733.204311-1-sourabhj...@linux.ibm.com/ - Fix a build warning about type casting v3: 9 Oct 2023 https://lore.kernel.org/all/20231009041953.36139-1-sourabhj...@linux.ibm.com/ - Assign physical address of elfcorehdr to fdh->elfcorehdr_addr - Rename a variable, boot_mem_dest_addr -> boot_mem_dest_offset v2: 25 Sep 2023 https://lore.kernel.org/all/20230925051214.678957-1-sourabhj...@linux.ibm.com/ - Fixed a few indentation issues reported by the checkpatch script. - Rebased it to 6.6.0-rc3 v1: 17 Sep 2023 https://lore.kernel.org/all/20230917080225.561627-1-sourabhj...@linux.ibm.com/ Cc: Aditya Gupta Cc: "Aneesh Kumar K.V" Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao Sourabh Jain (3): powerpc: make fadump resilient with memory add/remove events powerpc/fadump: add hotplug_ready sysfs interface Documentation/powerpc: update fadump implementation details Documentation/ABI/testing/sysfs-kernel-fadump | 11 + .../arch/powerpc/firmware-assisted-dump.rst | 91 ++--- arch/powerpc/include/asm/fadump-internal.h| 31 +- arch/powerpc/kernel/fadump.c | 375 ++ arch/powerpc/platforms/powernv/opal-fadump.c | 22 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 34 +- 6 files changed, 309 insertions(+), 255 deletions(-) -- 2.44.0
[PATCH v9 0/3] powerpc: make fadump resilient with memory add/remove events
m/ - Fix a comment on the first patch v4: 21 Oct 2023 https://lore.kernel.org/all/20231021181733.204311-1-sourabhj...@linux.ibm.com/ - Fix a build warning about type casting v3: 9 Oct 2023 https://lore.kernel.org/all/20231009041953.36139-1-sourabhj...@linux.ibm.com/ - Assign physical address of elfcorehdr to fdh->elfcorehdr_addr - Rename a variable, boot_mem_dest_addr -> boot_mem_dest_offset v2: 25 Sep 2023 https://lore.kernel.org/all/20230925051214.678957-1-sourabhj...@linux.ibm.com/ - Fixed a few indentation issues reported by the checkpatch script. - Rebased it to 6.6.0-rc3 v1: 17 Sep 2023 https://lore.kernel.org/all/20230917080225.561627-1-sourabhj...@linux.ibm.com/ Cc: Aditya Gupta Cc: "Aneesh Kumar K.V" Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao Sourabh Jain (3): powerpc: make fadump resilient with memory add/remove events powerpc/fadump: add hotplug_ready sysfs interface Documentation/powerpc: update fadump implementation details Documentation/ABI/testing/sysfs-kernel-fadump | 11 + .../arch/powerpc/firmware-assisted-dump.rst | 91 ++--- arch/powerpc/include/asm/fadump-internal.h| 31 +- arch/powerpc/kernel/fadump.c | 375 ++ arch/powerpc/platforms/powernv/opal-fadump.c | 22 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 34 +- 6 files changed, 309 insertions(+), 255 deletions(-) -- 2.43.0
[PATCH v9 3/3] Documentation/powerpc: update fadump implementation details
The patch titled ("powerpc: make fadump resilient with memory add/remove events") has made significant changes to the implementation of fadump, particularly on elfcorehdr creation and fadump crash info header structure. Therefore, updating the fadump implementation documentation to reflect those changes. Following updates are done to firmware assisted dump documentation: 1. The elfcorehdr is no longer stored after fadump HDR in the reserved dump area. Instead, the second kernel dynamically allocates memory for the elfcorehdr within the address range from 0 to the boot memory size. Therefore, update figures 1 and 2 of Memory Reservation during the first and second kernels to reflect this change. 2. A version field has been added to the fadump header to manage the future changes to fadump crash info header structure without changing the fadump header magic number in the future. Therefore, remove the corresponding TODO from the document. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- .../arch/powerpc/firmware-assisted-dump.rst | 91 +-- 1 file changed, 42 insertions(+), 49 deletions(-) diff --git a/Documentation/arch/powerpc/firmware-assisted-dump.rst b/Documentation/arch/powerpc/firmware-assisted-dump.rst index e363fc48529a..7e37aadd1f77 100644 --- a/Documentation/arch/powerpc/firmware-assisted-dump.rst +++ b/Documentation/arch/powerpc/firmware-assisted-dump.rst @@ -134,12 +134,12 @@ that are run. If there is dump data, then the memory is held. If there is no waiting dump data, then only the memory required to -hold CPU state, HPTE region, boot memory dump, FADump header and -elfcore header, is usually reserved at an offset greater than boot -memory size (see Fig. 1). This area is *not* released: this region -will be kept permanently reserved, so that it can act as a receptacle -for a copy of the boot memory content in addition to CPU state and -HPTE region, in the case a crash does occur. +hold CPU state, HPTE region, boot memory dump, and FADump header is +usually reserved at an offset greater than boot memory size (see Fig. 1). +This area is *not* released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy of the boot +memory content in addition to CPU state and HPTE region, in the case +a crash does occur. Since this reserved memory area is used only after the system crash, there is no point in blocking this significant chunk of memory from @@ -153,22 +153,22 @@ that were present in CMA region:: o Memory Reservation during first kernel - Low memory Top of memory - 0boot memory size |<--- Reserved dump area --->| | - | | |Permanent Reservation | | - V V || V - +---+-/ /---+---++---+-+-++--+ - | | |///|| DUMP | HDR | ELF || | - +---+-/ /---+---++---+-+-++--+ -| ^^ ^ ^ ^ -| || | | | -\ CPU HPTE / | | - -- | | - Boot memory content gets transferred| | - to reserved area by firmware at the | | - time of crash. | | - FADump Header | - (meta area)| + Low memory Top of memory + 0boot memory size |<-- Reserved dump area ->| | + | | | Permanent Reservation | | + V V | | V + +---+-/ /---+---++---+---++-+ + | | |///||DUMP | HDR || | + +---+-/ /---+---++---+---++-+ +| ^^ ^ ^ ^ +| || | | | +\ CPU HPTE / | | + | | + Boot memory content gets transferred | | + to reserved area by firmware at the | | + time of crash. | | + FADump Header | +(meta area) | | | Metadata: This area holds a metadata structure whose @@ -186,13 +186,2
[PATCH v9 1/3] powerpc: make fadump resilient with memory add/remove events
Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel prepares fadump header and stores it in the fadump reserved area. The fadump header includes the start address of the elfcorehdr, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr and set its address to the global variable elfcorehdr_addr for the vmcore module to process it in the second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. pt_regs_sz/cpu_mask_sz: Store size of pt_regs and cpu_mask structure of first kernel. These attributes are used to prevent dump processing if the sizes of pt_regs or cpu_mask structure differ between the first and fadump kernels. Note: if either first/crashed kernel or second/fadump kernel do not have the changes introduced here then kernel fail to collect the dump and prints relevant error message on the console. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: "Aneesh Kumar K.V" Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- arch/powerpc/include/asm/fadump-internal.h | 31 +- arch/powerpc/kernel/fadump.c | 361 +++ arch/powerpc/platforms/powernv/opal-fadump.c | 22 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 34 +- 4 files changed, 242 insertions(+), 206 deletions(-) diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index 27f9e11eda28..5d706a7acc8a 100644 --- a/arch/power
[PATCH v9 2/3] powerpc/fadump: add hotplug_ready sysfs interface
The elfcorehdr describes the CPUs and memory of the crashed kernel to the kernel that captures the dump, known as the second or fadump kernel. The elfcorehdr needs to be updated if the system's memory changes due to memory hotplug or online/offline events. Currently, memory hotplug events are monitored in userspace by udev rules, and fadump is re-registered, which recreates the elfcorehdr with the latest available memory in the system. However, the previous patch ("powerpc: make fadump resilient with memory add/remove events") moved the creation of elfcorehdr to the second or fadump kernel. This eliminates the need to regenerate the elfcorehdr during memory hotplug or online/offline events. Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let userspace know that fadump re-registration is not required for memory add/remove events. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: "Aneesh Kumar K.V" Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- Documentation/ABI/testing/sysfs-kernel-fadump | 11 +++ arch/powerpc/kernel/fadump.c | 14 ++ 2 files changed, 25 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump b/Documentation/ABI/testing/sysfs-kernel-fadump index 8f7a64a81783..c586054657d6 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump +++ b/Documentation/ABI/testing/sysfs-kernel-fadump @@ -38,3 +38,14 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Provide information about the amount of memory reserved by FADump to save the crash dump in bytes. + +What: /sys/kernel/fadump/hotplug_ready +Date: Apr 2024 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read only + Kdump udev rule re-registers fadump on memory add/remove events, + primarily to update the elfcorehdr. This sysfs indicates the + kdump udev rule that fadump re-registration is not required on + memory add/remove events because elfcorehdr is now prepared in + the second/fadump kernel. +User: kexec-tools diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index e816725a11c0..131bf0d2d45d 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1442,6 +1442,18 @@ static ssize_t enabled_show(struct kobject *kobj, return sprintf(buf, "%d\n", fw_dump.fadump_enabled); } +/* + * /sys/kernel/fadump/hotplug_ready sysfs node returns 1, which inidcates + * to usersapce that fadump re-registration is not required on memory + * hotplug events. + */ +static ssize_t hotplug_ready_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", 1); +} + static ssize_t mem_reserved_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -1514,11 +1526,13 @@ static struct kobj_attribute release_attr = __ATTR_WO(release_mem); static struct kobj_attribute enable_attr = __ATTR_RO(enabled); static struct kobj_attribute register_attr = __ATTR_RW(registered); static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved); +static struct kobj_attribute hotplug_ready_attr = __ATTR_RO(hotplug_ready); static struct attribute *fadump_attrs[] = { _attr.attr, _attr.attr, _reserved_attr.attr, + _ready_attr.attr, NULL, }; -- 2.43.0
[PATCH v18 6/6] powerpc/crash: add crash memory hotplug support
Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot remove, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the kexec tool. Passing this flag to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v18. arch/powerpc/include/asm/kexec.h| 3 + arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/crash.c | 95 - arch/powerpc/kexec/file_load_64.c | 20 +- arch/powerpc/kexec/ranges.c | 85 ++ 5 files changed, 202 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index e75970351bcd..95a98b390d62 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -141,6 +141,9 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); #define arch_crash_hotplug_support arch_crash_hotplug_support + +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size #endif /* CONFIG_CRASH_HOTPLUG */ extern int crashing_cpu; diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index 8489e844b447..14055896cbcb 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int get_exclude_memory_ranges(struct crash_mem **mem_ranges); int get_reserved_memory_ranges(struct crash_mem **mem_ranges); int get_crash_memory_ranges(struct crash_mem **mem_ranges); diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c index 8938a19af12f..21b193e938a3 100644 --- a/arch/powerpc/kexec/crash.c +++ b/arch/powerpc/kexec/crash.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -25,6 +26,7 @@ #include #include #include +#include /* * The primary CPU waits a while for all secondary CPUs to enter. This is to @@ -398,6 +400,94 @@ void default_machine_crash_shutdown(struct pt_regs *regs) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt +/* + * Advertise preferred elfcorehdr size to userspace via + * /sys/kernel/crash_elfcorehdr_size sysfs interface. + */ +unsigned int arch_crash_get_elfcorehdr_size(void) +{ + unsigned long phdr_cnt; + +
[PATCH v18 5/6] powerpc/crash: add crash CPU hotplug support
Due to CPU/Memory hotplug or online/offline events, the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward, CPU hotplug or online/offline events are referred as CPU/Memory add/remove events. The current solution to address the above issue involves monitoring the CPU/Memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. To address the aforementioned issue, commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. In the event of a CPU or memory add/remove events, the generic crash hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It then acquires the necessary locks to update the kdump image and invokes the architecture-specific crash hotplug handler, `arch_crash_handle_hotplug_event()`, to update the required kdump image components. This patch adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU add/remove events. Support for memory add/remove events is added in a subsequent patch with the title "powerpc: add crash memory hotplug support" As mentioned earlier, only the elfcorehdr and FDT kdump image components need to be updated in the event of CPU or memory add/remove events. However, on PowerPC architecture crash hotplug handler only updates the FDT to enable crash hotplug support for CPU add/remove events. Here's why. The elfcorehdr on PowerPC is built with possible CPUs, and thus, it does not need an update on CPU add/remove events. On the other hand, the FDT needs to be updated on CPU add events to include the newly added CPU. If the FDT is not updated and the kernel crashes on a newly added CPU, the kdump kernel will fail to boot due to the unavailability of the crashing CPU in the FDT. During the early boot, it is expected that the boot CPU must be a part of the FDT; otherwise, the kernel will raise a BUG and fail to boot. For more information, refer to commit 36ae37e3436b0 ("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is okay to have an offline CPU in the kdump FDT, no action is taken in case of CPU removal. There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. Few changes have been made to ensure kernel can safely update the FDT of kdump image loaded using both system calls. For kexec_file_load syscall the kdump image is prepared in kernel. So to support an increasing number of CPUs, the FDT is constructed with extra buffer space to ensure it can accommodate a possible number of CPU nodes. Additionally, a call to fdt_pack (which trims the unused space once the FDT is prepared) is avoided if this feature is enabled. For the kexec_load syscall, the FDT is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by userspace (kexec tools). When userspace passes this flag to the kernel, it indicates that the FDT is built to accommodate possible CPUs, and the FDT segment is excluded from SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v18 arch/powerpc/Kconfig | 4 ++ arch/powerpc/include/asm/kexec.h | 8 +++ arch/powerpc/kexec/crash.c| 103 ++ arch/powerpc/kexec/elf_64.c | 3 +- arch/powerpc/kexec/file_load_64.c | 17 + 5 files changed, 134 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1c4be3373686..a1a3b3363008 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -686,6 +686,10 @@ config ARCH_SELECTS_CRASH_DUMP depends on CRASH_DUMP sele
[PATCH v18 3/6] powerpc/kexec: move *_memory_ranges functions to ranges.c
Move the following functions form kexec/{file_load_64.c => ranges.c} and make them public so that components other than KEXEC_FILE can also use these functions. 1. get_exclude_memory_ranges 2. get_reserved_memory_ranges 3. get_crash_memory_ranges 4. get_usable_memory_ranges Later in the series get_crash_memory_ranges function is utilized for in-kernel updates to kdump image during CPU/Memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. Since the above functions are moved to ranges.c, some of the helper functions in ranges.c are no longer required to be public. Mark them as static and removed them from kexec_ranges.h header file. Finally, remove the CONFIG_KEXEC_FILE build dependency for range.c because it is required for other config, such as CONFIG_CRASH_DUMP. No functional changes are intended. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- Chnages in v18: * Fix a typo in the commit message arch/powerpc/include/asm/kexec_ranges.h | 19 +- arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/file_load_64.c | 190 arch/powerpc/kexec/ranges.c | 227 +++- 4 files changed, 224 insertions(+), 216 deletions(-) diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index f83866a19e87..8489e844b447 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,19 +7,8 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); -int add_tce_mem_ranges(struct crash_mem **mem_ranges); -int add_initrd_mem_range(struct crash_mem **mem_ranges); -#ifdef CONFIG_PPC_64S_HASH_MMU -int add_htab_mem_range(struct crash_mem **mem_ranges); -#else -static inline int add_htab_mem_range(struct crash_mem **mem_ranges) -{ - return 0; -} -#endif -int add_kernel_mem_range(struct crash_mem **mem_ranges); -int add_rtas_mem_range(struct crash_mem **mem_ranges); -int add_opal_mem_range(struct crash_mem **mem_ranges); -int add_reserved_mem_ranges(struct crash_mem **mem_ranges); - +int get_exclude_memory_ranges(struct crash_mem **mem_ranges); +int get_reserved_memory_ranges(struct crash_mem **mem_ranges); +int get_crash_memory_ranges(struct crash_mem **mem_ranges); +int get_usable_memory_ranges(struct crash_mem **mem_ranges); #endif /* _ASM_POWERPC_KEXEC_RANGES_H */ diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile index 8e469c4da3f8..470eb0453e17 100644 --- a/arch/powerpc/kexec/Makefile +++ b/arch/powerpc/kexec/Makefile @@ -3,11 +3,11 @@ # Makefile for the linux kernel. # -obj-y += core.o core_$(BITS).o +obj-y += core.o core_$(BITS).o ranges.o obj-$(CONFIG_PPC32)+= relocate_32.o -obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o +obj-$(CONFIG_KEXEC_FILE) += file_load.o file_load_$(BITS).o elf_$(BITS).o obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o obj-$(CONFIG_CRASH_DUMP) += crash.o diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 1bc65de6174f..6a01f62b8fcf 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -47,83 +47,6 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { NULL }; -/** - * get_exclude_memory_ranges - Get exclude memory ranges. This list includes - * regions like opal/rtas, tce-table, initrd, - * kernel, htab which should be avoided while - * setting up kexec load segments. - * @mem_ranges:Range list to add the memory ranges to. - * - * Returns 0 on success, negative errno on error. - */ -static int get_exclude_memory_ranges(struct crash_mem **mem_ranges) -{ - int ret; - - ret = add_tce_mem_ranges(mem_ranges); - if (ret) - goto out; - - ret = add_initrd_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_htab_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_kernel_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_rtas_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_opal_mem_range(mem_ran
[PATCH v18 4/6] PowerPC/kexec: make the update_cpus_node() function public
Move the update_cpus_node() from kexec/{file_load_64.c => core_64.c} to allow other kexec components to use it. Later in the series, this function is used for in-kernel updates to the kdump image during CPU/memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. No functional changes are intended. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v18 arch/powerpc/include/asm/kexec.h | 4 ++ arch/powerpc/kexec/core_64.c | 91 +++ arch/powerpc/kexec/file_load_64.c | 87 - 3 files changed, 95 insertions(+), 87 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index fdb90e24dc74..d9ff4d0e392d 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -185,6 +185,10 @@ static inline void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) #endif /* CONFIG_CRASH_DUMP */ +#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP) +int update_cpus_node(void *fdt); +#endif + #ifdef CONFIG_PPC_BOOK3S_64 #include #endif diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 762e4d09aacf..85050be08a23 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -30,6 +31,7 @@ #include #include #include +#include int machine_kexec_prepare(struct kimage *image) { @@ -419,3 +421,92 @@ static int __init export_htab_values(void) } late_initcall(export_htab_values); #endif /* CONFIG_PPC_64S_HASH_MMU */ + +#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP) +/** + * add_node_props - Reads node properties from device node structure and add + * them to fdt. + * @fdt:Flattened device tree of the kernel + * @node_offset:offset of the node to add a property at + * @dn: device node pointer + * + * Returns 0 on success, negative errno on error. + */ +static int add_node_props(void *fdt, int node_offset, const struct device_node *dn) +{ + int ret = 0; + struct property *pp; + + if (!dn) + return -EINVAL; + + for_each_property_of_node(dn, pp) { + ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length); + if (ret < 0) { + pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret)); + return ret; + } + } + return ret; +} + +/** + * update_cpus_node - Update cpus node of flattened device tree using of_root + *device node. + * @fdt: Flattened device tree of the kernel. + * + * Returns 0 on success, negative errno on error. + */ +int update_cpus_node(void *fdt) +{ + struct device_node *cpus_node, *dn; + int cpus_offset, cpus_subnode_offset, ret = 0; + + cpus_offset = fdt_path_offset(fdt, "/cpus"); + if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) { + pr_err("Malformed device tree: error reading /cpus node: %s\n", + fdt_strerror(cpus_offset)); + return cpus_offset; + } + + if (cpus_offset > 0) { + ret = fdt_del_node(fdt, cpus_offset); + if (ret < 0) { + pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret)); + return -EINVAL; + } + } + + /* Add cpus node to fdt */ + cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus"); + if (cpus_offset < 0) { + pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset)); + return -EINVAL; + } + + /* Add cpus node properties */ + cpus_node = of_find_node_by_path("/cpus"); + ret = add_node_props(fdt, cpus_offset, cpus_node); + of_node_put(cpus_node); + if (ret < 0) + return ret; + + /* Loop through all subnodes of cpus and add them to fdt */ + for_each_node_by_type(dn, "cpu") { + cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name); + if (cpus_subnode_offset < 0) { + pr_err("Unable to add %s subnode: %s\n", dn->full_name,
[PATCH v18 2/6] crash: add a new kexec flag for hotplug support
Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now we have two kexec flags that enables crash hotplug support for kexec_load system call. First is KEXEC_UPDATE_ELFCOREHDR (only used in x86), and second is KEXEC_CRASH_HOTPLUG_SUPPORT (for all architectures). To simplify the process of finding and reporting the crash hotplug support the following changes are introduced. 1. Define arch specific function to process the kexec flags and determine crash hotplug support 2. Rename the @update_elfcorehdr member of struct kimage to @hotplug_support and populate it for both kexec_load and kexec_file_load syscalls, because architecture can update more than one kexec segment 3. Let generic function crash_check_hotplug_support report hotplug support for loaded kdump image based on value of @hotplug_support To bring the x86 crash hotplug support in line with the above points, the following changes have been made: - Introduce the arch_crash_hotplug_support function to process kexec flags and determine crash hotplug support - Remove the arch_crash_hotplug_[cpu|memory]_support functions Signed-off-by: Sourabh Jain Acked-by: Baoquan He Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- Changes in v18: * Describe x86 changes in commit message * Update comment of crash_check_hotplug_support() arch/x86/include/asm/kexec.h | 11 ++- arch/x86/kernel/crash.c | 28 +--- drivers/base/cpu.c | 2 +- drivers/base/memory.c| 2 +- include/linux/crash_core.h | 13 ++--- include/linux/kexec.h| 11 +++ include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 15 ++- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 10 files changed, 48 insertions(+), 44 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index cb1320ebbc23..ae5482a2f0ca 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -210,15 +210,8 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support -#endif - -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support -#endif +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 2a682fe86352..f06501445cd9 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -402,20 +402,26 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) -{ - return crash_check_update_elfcorehdr(); -} +#ifdef CONFIG_KEXEC_FILE + if (image->file_mode) + return 1; #endif + /* +* Initially, crash hotplug support for kexec_load was added +* with the KEXEC_UPDATE_ELFCOREHDR flag. Later, this +* functionality was expanded to accommodate multiple kexec +* segment updates, leading to the introduction of the +* KEXEC_CRASH_HOT
[PATCH v18 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug
ystem call - Change in the way this feature is advertise to userspace for both kexec_load syscall - Rebase to v6.6-rc7 v11: - Rebase to v6.4-rc6 - The patch that introduced CONFIG_CRASH_HOTPLUG for PowerPC has been removed. The config is now part of common configuration: https://lore.kernel.org/all/87ilbpflsk.fsf@mail.lhotse/ v10: - Drop the patch that adds fdt_index attribute to struct kimage_arch Find the fdt segment index when needed. - Added more details into commits messages. - Rebased onto 6.3.0-rc5 v9: - Removed patch to prepare elfcorehdr crash notes for possible CPUs. The patch is moved to generic patch series that introduces generic infrastructure for in kernel crash update. - Removed patch to pass the hotplug action type to the arch crash hotplug handler function. The generic patch series has introduced the hotplug action type in kimage struct. - Add detail commit message for better understanding. v8: - Restrict fdt_index initialization to machine_kexec_post_load it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour - Updated the logic to find the number of offline core. [6/8] - Changed the logic to find the elfcore program header to accommodate future memory ranges due memory hotplug events. [8/8] v7 - added a new config to configure this feature - pass hotplug action type to arch specific handler v6 - Added crash memory hotplug support v5: - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU. - Move fdt segment identification for kexec_load case to load path instead of crash hotplug handler - Keep new attribute defined under kimage_arch to track FDT segment under CONFIG_HOTPLUG_CPU config. v4: - Update the logic to find the additional space needed for hotadd CPUs post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load" patch to know more about the change. - Fix a couple of typo. - Replace pr_err to pr_info_once to warn user about memory hotplug support. - In crash hotplug handle exit the for loop if FDT segment is found. v3 - Move fdt_index and fdt_index_vaild variables to kimage_arch struct. - Rebase patche on top of https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/ - Fixed warning reported by checpatch script v2: - Use generic hotplug handler introduced by https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/ a significant change from v1. Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org Sourabh Jain (6): crash: forward memory_notify arg to arch crash hotplug handler crash: add a new kexec flag for hotplug support powerpc/kexec: move *_memory_ranges functions to ranges.c PowerPC/kexec: make the update_cpus_node() function public powerpc/crash: add crash CPU hotplug support powerpc/crash: add crash memory hotplug support arch/powerpc/Kconfig| 4 + arch/powerpc/include/asm/kexec.h| 15 ++ arch/powerpc/include/asm/kexec_ranges.h | 20 +- arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/core_64.c| 91 +++ arch/powerpc/kexec/crash.c | 196 +++ arch/powerpc/kexec/elf_64.c | 3 +- arch/powerpc/kexec/file_load_64.c | 314 +++- arch/powerpc/kexec/ranges.c | 312 ++- arch/x86/include/asm/kexec.h| 13 +- arch/x86/kernel/crash.c | 32 ++- drivers/base/cpu.c | 2 +- drivers/base/memory.c | 2 +- include/linux/crash_core.h | 15 +- include/linux/kexec.h | 11 +- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 29 +-- kernel/kexec.c | 4 +- kernel/kexec_file.c | 5 + 19 files changed, 714 insertions(+), 359 deletions(-) -- 2.43.0
[PATCH v18 1/6] crash: forward memory_notify arg to arch crash hotplug handler
In the event of memory hotplug or online/offline events, the crash memory hotplug notifier `crash_memhp_notifier()` receives a `memory_notify` object but doesn't forward that object to the generic and architecture-specific crash hotplug handler. The `memory_notify` object contains the starting PFN (Page Frame Number) and the number of pages in the hot-removed memory. This information is necessary for architectures like PowerPC to update/recreate the kdump image, specifically `elfcorehdr`. So update the function signature of `crash_handle_hotplug_event()` and `arch_crash_handle_hotplug_event()` to accept the `memory_notify` object as an argument from crash memory hotplug notifier. Since no such object is available in the case of CPU hotplug event, the crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the crash hotplug handler. Signed-off-by: Sourabh Jain Acked-by: Baoquan He Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No chnages in v18 arch/x86/include/asm/kexec.h | 2 +- arch/x86/kernel/crash.c | 4 +++- include/linux/crash_core.h | 2 +- kernel/crash_core.c | 14 +++--- 4 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 91ca9a9ee3a2..cb1320ebbc23 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -207,7 +207,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image); extern void kdump_nmi_shootdown_cpus(void); #ifdef CONFIG_CRASH_HOTPLUG -void arch_crash_handle_hotplug_event(struct kimage *image); +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index e74d0c4286c1..2a682fe86352 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -432,10 +432,12 @@ unsigned int arch_crash_get_elfcorehdr_size(void) /** * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes * @image: a pointer to kexec_crash_image + * @arg: struct memory_notify handler for memory hotplug case and + * NULL for CPU hotplug case. * * Prepare the new elfcorehdr and replace the existing elfcorehdr. */ -void arch_crash_handle_hotplug_event(struct kimage *image) +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { void *elfbuf = NULL, *old_elfcorehdr; unsigned long nr_mem_ranges; diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index d33352c2e386..647e928efee8 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -37,7 +37,7 @@ static inline void arch_kexec_unprotect_crashkres(void) { } #ifndef arch_crash_handle_hotplug_event -static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } +static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } #endif int crash_check_update_elfcorehdr(void); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 78b5dc7cee3a..70fa8111a9d6 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -534,7 +534,7 @@ int crash_check_update_elfcorehdr(void) * list of segments it checks (since the elfcorehdr changes and thus * would require an update to purgatory itself to update the digest). */ -static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) +static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg) { struct kimage *image; @@ -596,7 +596,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) image->hp_action = hp_action; /* Now invoke arch-specific update handler */ - arch_crash_handle_hotplug_event(image); + arch_crash_handle_hotplug_event(image, arg); /* No longer handling a hotplug event */ image->hp_action = KEXEC_CRASH_HP_NONE; @@ -612,17 +612,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) crash_hotplug_unlock(); } -static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v) +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg) { switch (val) { case MEM_ONLINE: crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, - KEXEC_CRASH_HP_INVALID_CPU); + KEXEC_CRASH_HP_INVALID_CPU, arg);
Re: [PATCH v8 1/3] powerpc: make fadump resilient with memory add/remove events
Hello Hari, On 11/03/24 14:08, Hari Bathini wrote: On 17/02/24 12:50 pm, Sourabh Jain wrote: Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel is responsible for preparing the fadump header and storing it in the fadump reserved area. The fadump header includes the start address of the elfcorehdr, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr and set its address to the global variable elfcorehdr_addr for the vmcore module to process it in the second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. pt_regs_sz/cpu_mask_sz: Store size of pt_regs and cpu_mask structure of first kernel. These attributes are used to prevent dump processing if the sizes of pt_regs or cpu_mask structure differ between the first and fadump kernels. Note: if either first/crashed kernel or second/fadump kernel do not have the changes introduced here then kernel fail to collect the dump and prints relevant error message on the console. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- arch/powerpc/include/asm/fadump-internal.h | 31 +- arch/powerpc/kernel/fadump.c | 339 +++ arch/powerpc/platforms/powernv/opal-fadump.c | 22 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 30 +- 4 files changed, 232 insertions(+), 190 deletions(-) diff --git a/arch
Re: [PATCH v17 6/6] powerpc/crash: add crash memory hotplug support
On 02/03/24 18:49, Hari Bathini wrote: On 26/02/24 2:11 pm, Sourabh Jain wrote: Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot remove, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the kexec tool. Passing this flag to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Overall, the patchset looks good. I tried out the changes too. Acked-by: Hari Bathini Hello Hari, Thanks for trying out the change. - Sourabh Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h | 3 + arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/crash.c | 95 - arch/powerpc/kexec/file_load_64.c | 20 +- arch/powerpc/kexec/ranges.c | 85 ++ 5 files changed, 202 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index e75970351bcd..95a98b390d62 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -141,6 +141,9 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); #define arch_crash_hotplug_support arch_crash_hotplug_support + +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size #endif /* CONFIG_CRASH_HOTPLUG */ extern int crashing_cpu; diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index 8489e844b447..14055896cbcb 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int get_exclude_memory_ranges(struct crash_mem **mem_ranges); int get_reserved_memory_ranges(struct crash_mem **mem_ranges); int get_crash_memory_ranges(struct crash_mem **mem_ranges); diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c index 8938a19af12f..21b193e938a3 100644 --- a/arch/powerpc/kexec/crash.c +++ b/arch/powerpc/kexec/crash.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -25,6 +26,7 @@ #include #include #include +#include /* * The primary CPU waits a while for all secondary CPUs to enter. This is to @@ -398,6 +400,94 @@ void default_machine_crash_shutdown(struct pt_regs *regs) #undef pr_fmt
Re: [PATCH v17 2/6] crash: add a new kexec flag for hotplug support
Hello Hari, On 02/03/24 18:47, Hari Bathini wrote: On 26/02/24 2:11 pm, Sourabh Jain wrote: Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to the kernel, it indicates to the kernel that all the required kexec segment is skipped from SHA calculation and it is safe to update kdump image loaded using the kexec_load syscall. While loading the kdump image using the kexec_load syscall, the @update_elfcorehdr member of struct kimage is set if the kexec tool sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used to determine whether it is safe to update elfcorehdr on hotplug events. However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag, the kexec tool could mark all the required kexec segments on an architecture as safe to update. So rename the @update_elfcorehdr to @hotplug_support. If @hotplug_support is set, the kernel can safely update all the required kexec segments of the kdump image during CPU/Memory hotplug events. Introduce an architecture-specific function to process kexec flags for determining hotplug support. Set the @hotplug_support member of struct kimage for both kexec_load and kexec_file_load system calls. This simplifies kernel checks to identify hotplug support for the currently loaded kdump image by just examining the value of @hotplug_support. Couple of minor nits. See comments below. Otherwise, looks good to me. Acked-by: Hari Bathini Thank you! Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 11 ++- arch/x86/kernel/crash.c | 28 +--- drivers/base/cpu.c | 2 +- drivers/base/memory.c | 2 +- include/linux/crash_core.h | 13 ++--- include/linux/kexec.h | 11 +++ include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 11 --- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 10 files changed, 46 insertions(+), 42 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index cb1320ebbc23..ae5482a2f0ca 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -210,15 +210,8 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support -#endif - -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support -#endif +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 2a682fe86352..f06501445cd9 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -402,20 +402,26 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) -{ - return crash_check_update_elfcorehdr(); -} +#ifdef CONFIG_KEXEC_FILE + if (image->file_mode) + return 1; #endif + /* +
Re: [PATCH v17 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug
Hello Baoquan, On 29/02/24 19:21, Baoquan He wrote: Hi Sourabh, On 02/26/24 at 02:11pm, Sourabh Jain wrote: Commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. This patch series adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU/Memory add/remove events. Among the 5 patches in this series, the first two patches make changes to the generic crash hotplug handler to assist PowerPC in adding support for this feature. The last three patches add support for this feature. The whole series looks good to me. I have acked patch 1 and 2. Leave those three ppc patches to ppc expert to review and approve. Thanks a lot for your great work. Thanks for your feedback. I will soon send v18 to fix the two mirror document issues and will look forward to PPC maintainers to provide feedback on the rest of the series. Appreciate your support! - Sourabh
Re: [PATCH v17 2/6] crash: add a new kexec flag for hotplug support
Hello On 29/02/24 12:58, Baoquan He wrote: On 02/26/24 at 02:11pm, Sourabh Jain wrote: ..snip... diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 70fa8111a9d6..630c4fd7ea39 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -496,7 +496,7 @@ static DEFINE_MUTEX(__crash_hotplug_lock); * It reflects the kernel's ability/permission to update the crash * elfcorehdr directly. ~ this should be updated too. */ -int crash_check_update_elfcorehdr(void) +int crash_check_hotplug_support(void) { int rc = 0; @@ -508,10 +508,7 @@ int crash_check_update_elfcorehdr(void) return 0; } if (kexec_crash_image) { - if (kexec_crash_image->file_mode) - rc = 1; - else - rc = kexec_crash_image->update_elfcorehdr; + rc = kexec_crash_image->hotplug_support; } /* Release lock now that update complete */ kexec_unlock(); @@ -552,8 +549,8 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, image = kexec_crash_image; - /* Check that updating elfcorehdr is permitted */ - if (!(image->file_mode || image->update_elfcorehdr)) + /* Check that kexec segments update is permitted */ + if (!image->hotplug_support) goto out; if (hp_action == KEXEC_CRASH_HP_ADD_CPU || diff --git a/kernel/kexec.c b/kernel/kexec.c index bab542fc1463..a6b3f96bb50c 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -135,8 +135,8 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, image->preserve_context = 1; #ifdef CONFIG_CRASH_HOTPLUG - if (flags & KEXEC_UPDATE_ELFCOREHDR) - image->update_elfcorehdr = 1; + if ((flags & KEXEC_ON_CRASH) && arch_crash_hotplug_support(image, flags)) + image->hotplug_support = 1; #endif ret = machine_kexec_prepare(image); diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 2d1db05fbf04..3d64290d24c9 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -376,6 +376,11 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, if (ret) goto out; +#ifdef CONFIG_CRASH_HOTPLUG + if ((flags & KEXEC_FILE_ON_CRASH) && arch_crash_hotplug_support(image, flags)) + image->hotplug_support = 1; +#endif + ret = machine_kexec_prepare(image); if (ret) goto out; Other than the tiny part, the overall looks good to me. Acked-by: Baoquan He Thank you for the review and feedback. - Sourabh
Re: [PATCH v17 3/6] powerpc/kexec: move *_memory_ranges functions to ranges.c
On 29/02/24 13:41, Baoquan He wrote: On 02/26/24 at 02:11pm, Sourabh Jain wrote: Move the following functions form kexec/{file_load_64.c => ranges.c} and make them public so that components other KEXEC_FILE can also use these ^ 'than' missed? Yes, I will update it. Thanks, Sourabh Jain functions. 1. get_exclude_memory_ranges 2. get_reserved_memory_ranges 3. get_crash_memory_ranges 4. get_usable_memory_ranges Later in the series get_crash_memory_ranges function is utilized for in-kernel updates to kdump image during CPU/Memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. Since the above functions are moved to ranges.c, some of the helper functions in ranges.c are no longer required to be public. Mark them as static and removed them from kexec_ranges.h header file. Finally, remove the CONFIG_KEXEC_FILE build dependency for range.c because it is required for other config, such as CONFIG_CRASH_DUMP. No functional changes are intended. ..snip
Re: [PATCH v17 2/6] crash: add a new kexec flag for hotplug support
Hello Baoquan, On 29/02/24 12:58, Baoquan He wrote: On 02/26/24 at 02:11pm, Sourabh Jain wrote: ..snip... diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 70fa8111a9d6..630c4fd7ea39 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -496,7 +496,7 @@ static DEFINE_MUTEX(__crash_hotplug_lock); * It reflects the kernel's ability/permission to update the crash * elfcorehdr directly. ~ this should be updated too. Yes, it should. Thanks, Sourabh */ -int crash_check_update_elfcorehdr(void) +int crash_check_hotplug_support(void) { int rc = 0; @@ -508,10 +508,7 @@ int crash_check_update_elfcorehdr(void) return 0; } if (kexec_crash_image) { - if (kexec_crash_image->file_mode) - rc = 1; - else - rc = kexec_crash_image->update_elfcorehdr; + rc = kexec_crash_image->hotplug_support; } /* Release lock now that update complete */ kexec_unlock(); @@ -552,8 +549,8 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, image = kexec_crash_image; - /* Check that updating elfcorehdr is permitted */ - if (!(image->file_mode || image->update_elfcorehdr)) + /* Check that kexec segments update is permitted */ + if (!image->hotplug_support) goto out; if (hp_action == KEXEC_CRASH_HP_ADD_CPU || diff --git a/kernel/kexec.c b/kernel/kexec.c index bab542fc1463..a6b3f96bb50c 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -135,8 +135,8 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, image->preserve_context = 1; #ifdef CONFIG_CRASH_HOTPLUG - if (flags & KEXEC_UPDATE_ELFCOREHDR) - image->update_elfcorehdr = 1; + if ((flags & KEXEC_ON_CRASH) && arch_crash_hotplug_support(image, flags)) + image->hotplug_support = 1; #endif ret = machine_kexec_prepare(image); diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 2d1db05fbf04..3d64290d24c9 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -376,6 +376,11 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, if (ret) goto out; +#ifdef CONFIG_CRASH_HOTPLUG + if ((flags & KEXEC_FILE_ON_CRASH) && arch_crash_hotplug_support(image, flags)) + image->hotplug_support = 1; +#endif + ret = machine_kexec_prepare(image); if (ret) goto out; Other than the tiny part, the overall looks good to me. Acked-by: Baoquan He
Re: [PATCH v17 2/6] crash: add a new kexec flag for hotplug support
On 29/02/24 11:26, Baoquan He wrote: On 02/29/24 at 10:35am, Sourabh Jain wrote: Hello Baoquan, Do you have any comments or suggestions for this patch series, especially for this patch? Have applied this series and reviewing, will ack or add comment if any concern. Thanks. Thanks, looking forward to your feedback! - Sourabh On 26/02/24 14:11, Sourabh Jain wrote: Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to the kernel, it indicates to the kernel that all the required kexec segment is skipped from SHA calculation and it is safe to update kdump image loaded using the kexec_load syscall. While loading the kdump image using the kexec_load syscall, the @update_elfcorehdr member of struct kimage is set if the kexec tool sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used to determine whether it is safe to update elfcorehdr on hotplug events. However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag, the kexec tool could mark all the required kexec segments on an architecture as safe to update. So rename the @update_elfcorehdr to @hotplug_support. If @hotplug_support is set, the kernel can safely update all the required kexec segments of the kdump image during CPU/Memory hotplug events. Introduce an architecture-specific function to process kexec flags for determining hotplug support. Set the @hotplug_support member of struct kimage for both kexec_load and kexec_file_load system calls. This simplifies kernel checks to identify hotplug support for the currently loaded kdump image by just examining the value of @hotplug_support. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 11 ++- arch/x86/kernel/crash.c | 28 +--- drivers/base/cpu.c | 2 +- drivers/base/memory.c| 2 +- include/linux/crash_core.h | 13 ++--- include/linux/kexec.h| 11 +++ include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 11 --- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 10 files changed, 46 insertions(+), 42 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index cb1320ebbc23..ae5482a2f0ca 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -210,15 +210,8 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support -#endif - -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support -#endif +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 2a682fe86352..f06501445cd9 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -402,20 +402,26 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_ho
Re: [PATCH v8 1/3] powerpc: make fadump resilient with memory add/remove events
Hello Michael and Aneesh, Please let me know if you have any comments or suggestions for this patch series. Thanks, Sourabh On 17/02/24 12:50, Sourabh Jain wrote: Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel is responsible for preparing the fadump header and storing it in the fadump reserved area. The fadump header includes the start address of the elfcorehdr, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr and set its address to the global variable elfcorehdr_addr for the vmcore module to process it in the second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. pt_regs_sz/cpu_mask_sz: Store size of pt_regs and cpu_mask structure of first kernel. These attributes are used to prevent dump processing if the sizes of pt_regs or cpu_mask structure differ between the first and fadump kernels. Note: if either first/crashed kernel or second/fadump kernel do not have the changes introduced here then kernel fail to collect the dump and prints relevant error message on the console. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- arch/powerpc/include/asm/fadump-internal.h | 31 +- arch/powerpc/kernel/fadump.c | 339 +++ arch/powerpc/platforms/powernv/opal-fadump.c | 22 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 30 +- 4
Re: [PATCH v17 2/6] crash: add a new kexec flag for hotplug support
Hello Baoquan, Do you have any comments or suggestions for this patch series, especially for this patch? Thanks, Sourabh On 26/02/24 14:11, Sourabh Jain wrote: Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to the kernel, it indicates to the kernel that all the required kexec segment is skipped from SHA calculation and it is safe to update kdump image loaded using the kexec_load syscall. While loading the kdump image using the kexec_load syscall, the @update_elfcorehdr member of struct kimage is set if the kexec tool sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used to determine whether it is safe to update elfcorehdr on hotplug events. However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag, the kexec tool could mark all the required kexec segments on an architecture as safe to update. So rename the @update_elfcorehdr to @hotplug_support. If @hotplug_support is set, the kernel can safely update all the required kexec segments of the kdump image during CPU/Memory hotplug events. Introduce an architecture-specific function to process kexec flags for determining hotplug support. Set the @hotplug_support member of struct kimage for both kexec_load and kexec_file_load system calls. This simplifies kernel checks to identify hotplug support for the currently loaded kdump image by just examining the value of @hotplug_support. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 11 ++- arch/x86/kernel/crash.c | 28 +--- drivers/base/cpu.c | 2 +- drivers/base/memory.c| 2 +- include/linux/crash_core.h | 13 ++--- include/linux/kexec.h| 11 +++ include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 11 --- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 10 files changed, 46 insertions(+), 42 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index cb1320ebbc23..ae5482a2f0ca 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -210,15 +210,8 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support -#endif - -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support -#endif +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 2a682fe86352..f06501445cd9 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -402,20 +402,26 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) -{ - return crash_check_update_elfcorehdr(); -} +#ifdef CONFIG_KEXEC_FILE + if (image->file_mode) + return 1; #endif + /* +* Initially
[PATCH v17 6/6] powerpc/crash: add crash memory hotplug support
Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot remove, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the kexec tool. Passing this flag to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h| 3 + arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/crash.c | 95 - arch/powerpc/kexec/file_load_64.c | 20 +- arch/powerpc/kexec/ranges.c | 85 ++ 5 files changed, 202 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index e75970351bcd..95a98b390d62 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -141,6 +141,9 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); #define arch_crash_hotplug_support arch_crash_hotplug_support + +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size #endif /* CONFIG_CRASH_HOTPLUG */ extern int crashing_cpu; diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index 8489e844b447..14055896cbcb 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int get_exclude_memory_ranges(struct crash_mem **mem_ranges); int get_reserved_memory_ranges(struct crash_mem **mem_ranges); int get_crash_memory_ranges(struct crash_mem **mem_ranges); diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c index 8938a19af12f..21b193e938a3 100644 --- a/arch/powerpc/kexec/crash.c +++ b/arch/powerpc/kexec/crash.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -25,6 +26,7 @@ #include #include #include +#include /* * The primary CPU waits a while for all secondary CPUs to enter. This is to @@ -398,6 +400,94 @@ void default_machine_crash_shutdown(struct pt_regs *regs) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt +/* + * Advertise preferred elfcorehdr size to userspace via + * /sys/kernel/crash_elfcorehdr_size sysfs interface. + */ +unsigned int arch_crash_get_elfcorehdr_size(void) +{ + unsigned long phdr_cnt; + + /* A progra
[PATCH v17 5/6] powerpc/crash: add crash CPU hotplug support
Due to CPU/Memory hotplug or online/offline events, the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward, CPU hotplug or online/offline events are referred as CPU/Memory add/remove events. The current solution to address the above issue involves monitoring the CPU/Memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. To address the aforementioned issue, commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. In the event of a CPU or memory add/remove events, the generic crash hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It then acquires the necessary locks to update the kdump image and invokes the architecture-specific crash hotplug handler, `arch_crash_handle_hotplug_event()`, to update the required kdump image components. This patch adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU add/remove events. Support for memory add/remove events is added in a subsequent patch with the title "powerpc: add crash memory hotplug support" As mentioned earlier, only the elfcorehdr and FDT kdump image components need to be updated in the event of CPU or memory add/remove events. However, on PowerPC architecture crash hotplug handler only updates the FDT to enable crash hotplug support for CPU add/remove events. Here's why. The elfcorehdr on PowerPC is built with possible CPUs, and thus, it does not need an update on CPU add/remove events. On the other hand, the FDT needs to be updated on CPU add events to include the newly added CPU. If the FDT is not updated and the kernel crashes on a newly added CPU, the kdump kernel will fail to boot due to the unavailability of the crashing CPU in the FDT. During the early boot, it is expected that the boot CPU must be a part of the FDT; otherwise, the kernel will raise a BUG and fail to boot. For more information, refer to commit 36ae37e3436b0 ("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is okay to have an offline CPU in the kdump FDT, no action is taken in case of CPU removal. There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. Few changes have been made to ensure kernel can safely update the FDT of kdump image loaded using both system calls. For kexec_file_load syscall the kdump image is prepared in kernel. So to support an increasing number of CPUs, the FDT is constructed with extra buffer space to ensure it can accommodate a possible number of CPU nodes. Additionally, a call to fdt_pack (which trims the unused space once the FDT is prepared) is avoided if this feature is enabled. For the kexec_load syscall, the FDT is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by userspace (kexec tools). When userspace passes this flag to the kernel, it indicates that the FDT is built to accommodate possible CPUs, and the FDT segment is excluded from SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/Kconfig | 4 ++ arch/powerpc/include/asm/kexec.h | 8 +++ arch/powerpc/kexec/crash.c| 103 ++ arch/powerpc/kexec/elf_64.c | 3 +- arch/powerpc/kexec/file_load_64.c | 17 + 5 files changed, 134 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index e377deefa2dc..16d2b20574c4 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -686,6 +686,10 @@ config ARCH_SELECTS_CRASH_DUMP depends on CRASH_DUMP select RELOCATABLE if PPC6
[PATCH v17 4/6] PowerPC/kexec: make the update_cpus_node() function public
Move the update_cpus_node() from kexec/{file_load_64.c => core_64.c} to allow other kexec components to use it. Later in the series, this function is used for in-kernel updates to the kdump image during CPU/memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. No functional changes are intended. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h | 4 ++ arch/powerpc/kexec/core_64.c | 91 +++ arch/powerpc/kexec/file_load_64.c | 87 - 3 files changed, 95 insertions(+), 87 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index fdb90e24dc74..d9ff4d0e392d 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -185,6 +185,10 @@ static inline void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) #endif /* CONFIG_CRASH_DUMP */ +#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP) +int update_cpus_node(void *fdt); +#endif + #ifdef CONFIG_PPC_BOOK3S_64 #include #endif diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 762e4d09aacf..85050be08a23 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -30,6 +31,7 @@ #include #include #include +#include int machine_kexec_prepare(struct kimage *image) { @@ -419,3 +421,92 @@ static int __init export_htab_values(void) } late_initcall(export_htab_values); #endif /* CONFIG_PPC_64S_HASH_MMU */ + +#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP) +/** + * add_node_props - Reads node properties from device node structure and add + * them to fdt. + * @fdt:Flattened device tree of the kernel + * @node_offset:offset of the node to add a property at + * @dn: device node pointer + * + * Returns 0 on success, negative errno on error. + */ +static int add_node_props(void *fdt, int node_offset, const struct device_node *dn) +{ + int ret = 0; + struct property *pp; + + if (!dn) + return -EINVAL; + + for_each_property_of_node(dn, pp) { + ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length); + if (ret < 0) { + pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret)); + return ret; + } + } + return ret; +} + +/** + * update_cpus_node - Update cpus node of flattened device tree using of_root + *device node. + * @fdt: Flattened device tree of the kernel. + * + * Returns 0 on success, negative errno on error. + */ +int update_cpus_node(void *fdt) +{ + struct device_node *cpus_node, *dn; + int cpus_offset, cpus_subnode_offset, ret = 0; + + cpus_offset = fdt_path_offset(fdt, "/cpus"); + if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) { + pr_err("Malformed device tree: error reading /cpus node: %s\n", + fdt_strerror(cpus_offset)); + return cpus_offset; + } + + if (cpus_offset > 0) { + ret = fdt_del_node(fdt, cpus_offset); + if (ret < 0) { + pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret)); + return -EINVAL; + } + } + + /* Add cpus node to fdt */ + cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus"); + if (cpus_offset < 0) { + pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset)); + return -EINVAL; + } + + /* Add cpus node properties */ + cpus_node = of_find_node_by_path("/cpus"); + ret = add_node_props(fdt, cpus_offset, cpus_node); + of_node_put(cpus_node); + if (ret < 0) + return ret; + + /* Loop through all subnodes of cpus and add them to fdt */ + for_each_node_by_type(dn, "cpu") { + cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name); + if (cpus_subnode_offset < 0) { + pr_err("Unable to add %s subnode: %s\n", dn->full_name,
[PATCH v17 3/6] powerpc/kexec: move *_memory_ranges functions to ranges.c
Move the following functions form kexec/{file_load_64.c => ranges.c} and make them public so that components other KEXEC_FILE can also use these functions. 1. get_exclude_memory_ranges 2. get_reserved_memory_ranges 3. get_crash_memory_ranges 4. get_usable_memory_ranges Later in the series get_crash_memory_ranges function is utilized for in-kernel updates to kdump image during CPU/Memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. Since the above functions are moved to ranges.c, some of the helper functions in ranges.c are no longer required to be public. Mark them as static and removed them from kexec_ranges.h header file. Finally, remove the CONFIG_KEXEC_FILE build dependency for range.c because it is required for other config, such as CONFIG_CRASH_DUMP. No functional changes are intended. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec_ranges.h | 19 +- arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/file_load_64.c | 190 arch/powerpc/kexec/ranges.c | 227 +++- 4 files changed, 224 insertions(+), 216 deletions(-) diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index f83866a19e87..8489e844b447 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,19 +7,8 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); -int add_tce_mem_ranges(struct crash_mem **mem_ranges); -int add_initrd_mem_range(struct crash_mem **mem_ranges); -#ifdef CONFIG_PPC_64S_HASH_MMU -int add_htab_mem_range(struct crash_mem **mem_ranges); -#else -static inline int add_htab_mem_range(struct crash_mem **mem_ranges) -{ - return 0; -} -#endif -int add_kernel_mem_range(struct crash_mem **mem_ranges); -int add_rtas_mem_range(struct crash_mem **mem_ranges); -int add_opal_mem_range(struct crash_mem **mem_ranges); -int add_reserved_mem_ranges(struct crash_mem **mem_ranges); - +int get_exclude_memory_ranges(struct crash_mem **mem_ranges); +int get_reserved_memory_ranges(struct crash_mem **mem_ranges); +int get_crash_memory_ranges(struct crash_mem **mem_ranges); +int get_usable_memory_ranges(struct crash_mem **mem_ranges); #endif /* _ASM_POWERPC_KEXEC_RANGES_H */ diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile index 8e469c4da3f8..470eb0453e17 100644 --- a/arch/powerpc/kexec/Makefile +++ b/arch/powerpc/kexec/Makefile @@ -3,11 +3,11 @@ # Makefile for the linux kernel. # -obj-y += core.o core_$(BITS).o +obj-y += core.o core_$(BITS).o ranges.o obj-$(CONFIG_PPC32)+= relocate_32.o -obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o +obj-$(CONFIG_KEXEC_FILE) += file_load.o file_load_$(BITS).o elf_$(BITS).o obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o obj-$(CONFIG_CRASH_DUMP) += crash.o diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 1bc65de6174f..6a01f62b8fcf 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -47,83 +47,6 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { NULL }; -/** - * get_exclude_memory_ranges - Get exclude memory ranges. This list includes - * regions like opal/rtas, tce-table, initrd, - * kernel, htab which should be avoided while - * setting up kexec load segments. - * @mem_ranges:Range list to add the memory ranges to. - * - * Returns 0 on success, negative errno on error. - */ -static int get_exclude_memory_ranges(struct crash_mem **mem_ranges) -{ - int ret; - - ret = add_tce_mem_ranges(mem_ranges); - if (ret) - goto out; - - ret = add_initrd_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_htab_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_kernel_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_rtas_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_opal_mem_range(mem_ranges); - if (ret) - goto out; - -
[PATCH v17 2/6] crash: add a new kexec flag for hotplug support
Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to the kernel, it indicates to the kernel that all the required kexec segment is skipped from SHA calculation and it is safe to update kdump image loaded using the kexec_load syscall. While loading the kdump image using the kexec_load syscall, the @update_elfcorehdr member of struct kimage is set if the kexec tool sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used to determine whether it is safe to update elfcorehdr on hotplug events. However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag, the kexec tool could mark all the required kexec segments on an architecture as safe to update. So rename the @update_elfcorehdr to @hotplug_support. If @hotplug_support is set, the kernel can safely update all the required kexec segments of the kdump image during CPU/Memory hotplug events. Introduce an architecture-specific function to process kexec flags for determining hotplug support. Set the @hotplug_support member of struct kimage for both kexec_load and kexec_file_load system calls. This simplifies kernel checks to identify hotplug support for the currently loaded kdump image by just examining the value of @hotplug_support. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 11 ++- arch/x86/kernel/crash.c | 28 +--- drivers/base/cpu.c | 2 +- drivers/base/memory.c| 2 +- include/linux/crash_core.h | 13 ++--- include/linux/kexec.h| 11 +++ include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 11 --- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 10 files changed, 46 insertions(+), 42 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index cb1320ebbc23..ae5482a2f0ca 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -210,15 +210,8 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support -#endif - -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support -#endif +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 2a682fe86352..f06501445cd9 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -402,20 +402,26 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) -{ - return crash_check_update_elfcorehdr(); -} +#ifdef CONFIG_KEXEC_FILE + if (image->file_mode) + return 1; #endif + /* +* Initially, crash hotplug support for kexec_load was added +* with the KEXEC_UPDATE_ELFCOREHDR flag. Later, this +* functionality was expanded to accommodate multiple kexec +* s
[PATCH v17 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug
The patch that introduced CONFIG_CRASH_HOTPLUG for PowerPC has been removed. The config is now part of common configuration: https://lore.kernel.org/all/87ilbpflsk.fsf@mail.lhotse/ v10: - Drop the patch that adds fdt_index attribute to struct kimage_arch Find the fdt segment index when needed. - Added more details into commits messages. - Rebased onto 6.3.0-rc5 v9: - Removed patch to prepare elfcorehdr crash notes for possible CPUs. The patch is moved to generic patch series that introduces generic infrastructure for in kernel crash update. - Removed patch to pass the hotplug action type to the arch crash hotplug handler function. The generic patch series has introduced the hotplug action type in kimage struct. - Add detail commit message for better understanding. v8: - Restrict fdt_index initialization to machine_kexec_post_load it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour - Updated the logic to find the number of offline core. [6/8] - Changed the logic to find the elfcore program header to accommodate future memory ranges due memory hotplug events. [8/8] v7 - added a new config to configure this feature - pass hotplug action type to arch specific handler v6 - Added crash memory hotplug support v5: - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU. - Move fdt segment identification for kexec_load case to load path instead of crash hotplug handler - Keep new attribute defined under kimage_arch to track FDT segment under CONFIG_HOTPLUG_CPU config. v4: - Update the logic to find the additional space needed for hotadd CPUs post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load" patch to know more about the change. - Fix a couple of typo. - Replace pr_err to pr_info_once to warn user about memory hotplug support. - In crash hotplug handle exit the for loop if FDT segment is found. v3 - Move fdt_index and fdt_index_vaild variables to kimage_arch struct. - Rebase patche on top of https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/ - Fixed warning reported by checpatch script v2: - Use generic hotplug handler introduced by https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/ a significant change from v1. Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org Sourabh Jain (6): crash: forward memory_notify arg to arch crash hotplug handler crash: add a new kexec flag for hotplug support powerpc/kexec: move *_memory_ranges functions to ranges.c PowerPC/kexec: make the update_cpus_node() function public powerpc/crash: add crash CPU hotplug support powerpc/crash: add crash memory hotplug support arch/powerpc/Kconfig| 4 + arch/powerpc/include/asm/kexec.h| 15 ++ arch/powerpc/include/asm/kexec_ranges.h | 20 +- arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/core_64.c| 91 +++ arch/powerpc/kexec/crash.c | 196 +++ arch/powerpc/kexec/elf_64.c | 3 +- arch/powerpc/kexec/file_load_64.c | 314 +++- arch/powerpc/kexec/ranges.c | 312 ++- arch/x86/include/asm/kexec.h| 13 +- arch/x86/kernel/crash.c | 32 ++- drivers/base/cpu.c | 2 +- drivers/base/memory.c | 2 +- include/linux/crash_core.h | 15 +- include/linux/kexec.h | 11 +- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 25 +- kernel/kexec.c | 4 +- kernel/kexec_file.c | 5 + 19 files changed, 712 insertions(+), 357 deletions(-) -- 2.43.0
[PATCH v17 1/6] crash: forward memory_notify arg to arch crash hotplug handler
In the event of memory hotplug or online/offline events, the crash memory hotplug notifier `crash_memhp_notifier()` receives a `memory_notify` object but doesn't forward that object to the generic and architecture-specific crash hotplug handler. The `memory_notify` object contains the starting PFN (Page Frame Number) and the number of pages in the hot-removed memory. This information is necessary for architectures like PowerPC to update/recreate the kdump image, specifically `elfcorehdr`. So update the function signature of `crash_handle_hotplug_event()` and `arch_crash_handle_hotplug_event()` to accept the `memory_notify` object as an argument from crash memory hotplug notifier. Since no such object is available in the case of CPU hotplug event, the crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the crash hotplug handler. Signed-off-by: Sourabh Jain Acked-by: Baoquan He Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 2 +- arch/x86/kernel/crash.c | 4 +++- include/linux/crash_core.h | 2 +- kernel/crash_core.c | 14 +++--- 4 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 91ca9a9ee3a2..cb1320ebbc23 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -207,7 +207,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image); extern void kdump_nmi_shootdown_cpus(void); #ifdef CONFIG_CRASH_HOTPLUG -void arch_crash_handle_hotplug_event(struct kimage *image); +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index e74d0c4286c1..2a682fe86352 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -432,10 +432,12 @@ unsigned int arch_crash_get_elfcorehdr_size(void) /** * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes * @image: a pointer to kexec_crash_image + * @arg: struct memory_notify handler for memory hotplug case and + * NULL for CPU hotplug case. * * Prepare the new elfcorehdr and replace the existing elfcorehdr. */ -void arch_crash_handle_hotplug_event(struct kimage *image) +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { void *elfbuf = NULL, *old_elfcorehdr; unsigned long nr_mem_ranges; diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index d33352c2e386..647e928efee8 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -37,7 +37,7 @@ static inline void arch_kexec_unprotect_crashkres(void) { } #ifndef arch_crash_handle_hotplug_event -static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } +static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } #endif int crash_check_update_elfcorehdr(void); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 78b5dc7cee3a..70fa8111a9d6 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -534,7 +534,7 @@ int crash_check_update_elfcorehdr(void) * list of segments it checks (since the elfcorehdr changes and thus * would require an update to purgatory itself to update the digest). */ -static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) +static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg) { struct kimage *image; @@ -596,7 +596,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) image->hp_action = hp_action; /* Now invoke arch-specific update handler */ - arch_crash_handle_hotplug_event(image); + arch_crash_handle_hotplug_event(image, arg); /* No longer handling a hotplug event */ image->hp_action = KEXEC_CRASH_HP_NONE; @@ -612,17 +612,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) crash_hotplug_unlock(); } -static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v) +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg) { switch (val) { case MEM_ONLINE: crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, - KEXEC_CRASH_HP_INVALID_CPU); + KEXEC_CRASH_HP_INVALID_CPU, arg); break;
Re: [PATCH linux-next 3/3] powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency
Hello Hari, Build failure detected. On 13/02/24 17:01, Hari Bathini wrote: Remove CONFIG_CRASH_DUMP dependency on CONFIG_KEXEC. CONFIG_KEXEC_CORE was used at places where CONFIG_CRASH_DUMP or CONFIG_CRASH_RESERVE was appropriate. Replace with appropriate #ifdefs to support CONFIG_KEXEC and !CONFIG_CRASH_DUMP configuration option. Also, make CONFIG_FA_DUMP dependent on CONFIG_CRASH_DUMP to avoid unmet dependencies for FA_DUMP with !CONFIG_KEXEC_CORE configuration option. Signed-off-by: Hari Bathini --- arch/powerpc/Kconfig | 9 +-- arch/powerpc/include/asm/kexec.h | 98 +++--- arch/powerpc/kernel/prom.c | 2 +- arch/powerpc/kernel/setup-common.c | 2 +- arch/powerpc/kernel/smp.c | 4 +- arch/powerpc/kexec/Makefile| 3 +- arch/powerpc/kexec/core.c | 4 ++ 7 files changed, 60 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 5cf8ad8d7e8e..e377deefa2dc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -607,11 +607,6 @@ config PPC64_SUPPORTS_MEMORY_FAILURE config ARCH_SUPPORTS_KEXEC def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP) -config ARCH_SELECTS_KEXEC - def_bool y - depends on KEXEC - select CRASH_DUMP - config ARCH_SUPPORTS_KEXEC_FILE def_bool PPC64 @@ -622,7 +617,6 @@ config ARCH_SELECTS_KEXEC_FILE def_bool y depends on KEXEC_FILE select KEXEC_ELF - select CRASH_DUMP select HAVE_IMA_KEXEC if IMA config PPC64_BIG_ENDIAN_ELF_ABI_V2 @@ -694,8 +688,7 @@ config ARCH_SELECTS_CRASH_DUMP config FA_DUMP bool "Firmware-assisted dump" - depends on PPC64 && (PPC_RTAS || PPC_POWERNV) - select CRASH_DUMP + depends on CRASH_DUMP && PPC64 && (PPC_RTAS || PPC_POWERNV) help A robust mechanism to get reliable kernel crash dump with assistance from firmware. This approach does not use kexec, diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index e1b43aa12175..fdb90e24dc74 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -55,59 +55,18 @@ typedef void (*crash_shutdown_t)(void); #ifdef CONFIG_KEXEC_CORE - -/* - * This function is responsible for capturing register states if coming - * via panic or invoking dump using sysrq-trigger. - */ -static inline void crash_setup_regs(struct pt_regs *newregs, - struct pt_regs *oldregs) -{ - if (oldregs) - memcpy(newregs, oldregs, sizeof(*newregs)); - else - ppc_save_regs(newregs); -} +struct kimage; +struct pt_regs; extern void kexec_smp_wait(void); /* get and clear naca physid, wait for master to copy new code to 0 */ -extern int crashing_cpu; -extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)); -extern void crash_ipi_callback(struct pt_regs *); -extern int crash_wake_offline; - -struct kimage; -struct pt_regs; extern void default_machine_kexec(struct kimage *image); -extern void default_machine_crash_shutdown(struct pt_regs *regs); -extern int crash_shutdown_register(crash_shutdown_t handler); -extern int crash_shutdown_unregister(crash_shutdown_t handler); - -extern void crash_kexec_prepare(void); -extern void crash_kexec_secondary(struct pt_regs *regs); -int __init overlaps_crashkernel(unsigned long start, unsigned long size); -extern void reserve_crashkernel(void); extern void machine_kexec_mask_interrupts(void); -static inline bool kdump_in_progress(void) -{ - return crashing_cpu >= 0; -} - void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_code_buffer, unsigned long start_address) __noreturn; - void kexec_copy_flush(struct kimage *image); -#if defined(CONFIG_CRASH_DUMP) -bool is_kdump_kernel(void); -#define is_kdump_kernelis_kdump_kernel -#if defined(CONFIG_PPC_RTAS) -void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); -#define crash_free_reserved_phys_range crash_free_reserved_phys_range -#endif /* CONFIG_PPC_RTAS */ -#endif /* CONFIG_CRASH_DUMP */ - #ifdef CONFIG_KEXEC_FILE extern const struct kexec_file_ops kexec_elf64_ops; @@ -152,15 +111,56 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, #endif /* CONFIG_KEXEC_FILE */ -#else /* !CONFIG_KEXEC_CORE */ -static inline void crash_kexec_secondary(struct pt_regs *regs) { } +#endif /* CONFIG_KEXEC_CORE */ + +#ifdef CONFIG_CRASH_RESERVE +int __init overlaps_crashkernel(unsigned long start, unsigned long size); +extern void reserve_crashkernel(void); +#else +static inline void reserve_crashkernel(void) {} +static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; } +#endif -static inline int overlaps_crashkernel(unsigned
Re: [PATCH v16 2/5] crash: add a new kexec flag for hotplug support
On 22/02/24 09:28, Baoquan He wrote: On 02/22/24 at 09:01am, Sourabh Jain wrote: Hello Baoquan, There are a lot of code movements introduced by your patch series, 'Split crash out from kexec and clean up related config items.' https://lore.kernel.org/all/20240221125752.36fbfe9c307496313198b...@linux-foundation.org/ Do you want me to rebase this patch series on top of the above patch series? Yes, appreciate that, that would be very helpful. Rebasing this to latest next/master. And I saw Hari's patch sereis, basically it's fine to me. It will be great if this patch can sit on that patchset. Sure, let me rebase it and send v17. Thanks, Sourabh On 17/02/24 13:44, Sourabh Jain wrote: Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to the kernel, it indicates to the kernel that all the required kexec segment is skipped from SHA calculation and it is safe to update kdump image loaded using the kexec_load syscall. While loading the kdump image using the kexec_load syscall, the @update_elfcorehdr member of struct kimage is set if the kexec tool sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used to determine whether it is safe to update elfcorehdr on hotplug events. However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag, the kexec tool could mark all the required kexec segments on an architecture as safe to update. So rename the @update_elfcorehdr to @hotplug_support. If @hotplug_support is set, the kernel can safely update all the required kexec segments of the kdump image during CPU/Memory hotplug events. Introduce an architecture-specific function to process kexec flags for determining hotplug support. Set the @hotplug_support member of struct kimage for both kexec_load and kexec_file_load system calls. This simplifies kernel checks to identify hotplug support for the currently loaded kdump image by just examining the value of @hotplug_support. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 11 ++- arch/x86/kernel/crash.c | 28 +--- drivers/base/cpu.c | 2 +- drivers/base/memory.c| 2 +- include/linux/kexec.h| 27 +-- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 11 --- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 9 files changed, 50 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 9bb6607e864e..8be622e82ba8 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -211,15 +211,8 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support -#endif - -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support -#endif +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 44744e9c68ec..7072aaee2ea0 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -398,20 +398,26 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs
Re: [PATCH v16 2/5] crash: add a new kexec flag for hotplug support
Hello Baoquan, There are a lot of code movements introduced by your patch series, 'Split crash out from kexec and clean up related config items.' https://lore.kernel.org/all/20240221125752.36fbfe9c307496313198b...@linux-foundation.org/ Do you want me to rebase this patch series on top of the above patch series? Thanks, Sourabh Jain On 17/02/24 13:44, Sourabh Jain wrote: Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to the kernel, it indicates to the kernel that all the required kexec segment is skipped from SHA calculation and it is safe to update kdump image loaded using the kexec_load syscall. While loading the kdump image using the kexec_load syscall, the @update_elfcorehdr member of struct kimage is set if the kexec tool sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used to determine whether it is safe to update elfcorehdr on hotplug events. However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag, the kexec tool could mark all the required kexec segments on an architecture as safe to update. So rename the @update_elfcorehdr to @hotplug_support. If @hotplug_support is set, the kernel can safely update all the required kexec segments of the kdump image during CPU/Memory hotplug events. Introduce an architecture-specific function to process kexec flags for determining hotplug support. Set the @hotplug_support member of struct kimage for both kexec_load and kexec_file_load system calls. This simplifies kernel checks to identify hotplug support for the currently loaded kdump image by just examining the value of @hotplug_support. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 11 ++- arch/x86/kernel/crash.c | 28 +--- drivers/base/cpu.c | 2 +- drivers/base/memory.c| 2 +- include/linux/kexec.h| 27 +-- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 11 --- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 9 files changed, 50 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 9bb6607e864e..8be622e82ba8 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -211,15 +211,8 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support -#endif - -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support -#endif +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 44744e9c68ec..7072aaee2ea0 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -398,20 +398,26 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(voi
Re: [PATCH v2 02/14] crash: split vmcoreinfo exporting code out from crash_core.c
Hello Baoquan, On 19/01/24 20:22, Baoquan He wrote: Now move the relevant codes into separate files: kernel/crash_reserve.c, include/linux/crash_reserve.h. And add config item CRASH_RESERVE to control its enabling. Feels like this patch is more about vmcore_info.[c|h] and CONFIG_VMCORE_INFO then the above mentioned files and config. - Sourabh And also update the old ifdeffery of CONFIG_CRASH_CORE, including of and config item dependency on CRASH_CORE accordingly. And also do renaming as follows: - arch/xxx/kernel/{crash_core.c => vmcore_info.c} because they are only related to vmcoreinfo exporting on x86, arm64, riscv. And also Remove config item CRASH_CORE, and rely on CONFIG_KEXEC_CORE to decide if build in crash_core.c. Signed-off-by: Baoquan He --- arch/arm64/kernel/Makefile| 2 +- .../kernel/{crash_core.c => vmcore_info.c}| 2 +- arch/powerpc/Kconfig | 2 +- arch/powerpc/kernel/setup-common.c| 2 +- arch/powerpc/platforms/powernv/opal-core.c| 2 +- arch/riscv/kernel/Makefile| 2 +- .../kernel/{crash_core.c => vmcore_info.c}| 2 +- arch/x86/kernel/Makefile | 2 +- .../{crash_core_32.c => vmcore_info_32.c} | 2 +- .../{crash_core_64.c => vmcore_info_64.c} | 2 +- drivers/firmware/qemu_fw_cfg.c| 14 +- fs/proc/Kconfig | 2 +- fs/proc/kcore.c | 2 +- include/linux/buildid.h | 2 +- include/linux/crash_core.h| 73 -- include/linux/kexec.h | 1 + include/linux/vmcore_info.h | 81 ++ kernel/Kconfig.kexec | 4 +- kernel/Makefile | 4 +- kernel/crash_core.c | 208 kernel/ksysfs.c | 6 +- kernel/printk/printk.c| 4 +- kernel/vmcore_info.c | 233 ++ lib/buildid.c | 2 +- 24 files changed, 345 insertions(+), 311 deletions(-) rename arch/arm64/kernel/{crash_core.c => vmcore_info.c} (97%) rename arch/riscv/kernel/{crash_core.c => vmcore_info.c} (96%) rename arch/x86/kernel/{crash_core_32.c => vmcore_info_32.c} (90%) rename arch/x86/kernel/{crash_core_64.c => vmcore_info_64.c} (94%) create mode 100644 include/linux/vmcore_info.h create mode 100644 kernel/vmcore_info.c diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index d95b3d6b471a..bcf89587a549 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -66,7 +66,7 @@ obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file.o kexec_image.o obj-$(CONFIG_ARM64_RELOC_TEST)+= arm64-reloc-test.o arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o -obj-$(CONFIG_CRASH_CORE) += crash_core.o +obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o obj-$(CONFIG_ARM64_MTE) += mte.o diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/vmcore_info.c similarity index 97% rename from arch/arm64/kernel/crash_core.c rename to arch/arm64/kernel/vmcore_info.c index 66cde752cd74..a5abf7186922 100644 --- a/arch/arm64/kernel/crash_core.c +++ b/arch/arm64/kernel/vmcore_info.c @@ -4,7 +4,7 @@ * Copyright (C) Huawei Futurewei Technologies. */ -#include +#include #include #include #include diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 6aeab95f0edd..1520146d7c2c 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -690,7 +690,7 @@ config ARCH_SELECTS_CRASH_DUMP config FA_DUMP bool "Firmware-assisted dump" depends on PPC64 && (PPC_RTAS || PPC_POWERNV) - select CRASH_CORE + select VMCORE_INFO select CRASH_RESERVE select CRASH_DUMP help diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 9b142b9d5187..733f210ffda1 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -109,7 +109,7 @@ int ppc_do_canonicalize_irqs; EXPORT_SYMBOL(ppc_do_canonicalize_irqs); #endif -#ifdef CONFIG_CRASH_CORE +#ifdef CONFIG_VMCORE_INFO /* This keeps a track of which one is the crashing cpu. */ int crashing_cpu = -1; #endif diff --git a/arch/powerpc/platforms/powernv/opal-core.c b/arch/powerpc/platforms/powernv/opal-core.c index bb7657115f1d..c9a9b759cc92 100644 --- a/arch/powerpc/platforms/powernv/opal-core.c +++ b/arch/powerpc/platforms/powernv/opal-core.c @@ -16,7 +16,7 @@ #include #include #include -#include +#include #include
Re: [PATCH v2 01/14] kexec: split crashkernel reservation code out from crash_core.c
Hello Baoquan, Thank you for reorganizing the kexec and kdump code with a well-defined configuration structure. While reviewing the patch series, I noticed a few typos. On 19/01/24 20:22, Baoquan He wrote: Both kdump and fa_dump of ppc rely on crashkernel reservation. Move the relevant codes into separate files: crash_reserve.c, include/linux/crash_reserve.h. And also add config item CRASH_RESERVE to control its enabling of the codes. And update config items which has relationship with crashkernel reservation. And also change ifdeffery from CONFIG_CRASH_CORE to CONFIG_CRASH_RESERVE when those scopes are only crashkernel reservation related. And also rename arch/XXX/include/asm/{crash_core.h => crash_reserve.h} on arm64, x86 and risc-v because those architectures' crash_core.h is only related to crashkernel reservation. Signed-off-by: Baoquan He --- arch/arm64/Kconfig| 2 +- .../asm/{crash_core.h => crash_reserve.h} | 4 +- arch/powerpc/Kconfig | 1 + arch/powerpc/mm/nohash/kaslr_booke.c | 4 +- arch/riscv/Kconfig| 2 +- .../asm/{crash_core.h => crash_reserve.h} | 4 +- arch/x86/Kconfig | 2 +- .../asm/{crash_core.h => crash_reserve.h} | 6 +- include/linux/crash_core.h| 40 -- include/linux/crash_reserve.h | 48 ++ include/linux/kexec.h | 1 + kernel/Kconfig.kexec | 5 +- kernel/Makefile | 1 + kernel/crash_core.c | 438 - kernel/crash_reserve.c| 464 ++ 15 files changed, 531 insertions(+), 491 deletions(-) rename arch/arm64/include/asm/{crash_core.h => crash_reserve.h} (81%) rename arch/riscv/include/asm/{crash_core.h => crash_reserve.h} (78%) rename arch/x86/include/asm/{crash_core.h => crash_reserve.h} (92%) create mode 100644 include/linux/crash_reserve.h create mode 100644 kernel/crash_reserve.c diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index ea01a2c43efa..d96bc3c67ec6 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1501,7 +1501,7 @@ config ARCH_SUPPORTS_CRASH_DUMP def_bool y config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION - def_bool CRASH_CORE + def_bool CRASH_RESERVE config TRANS_TABLE def_bool y diff --git a/arch/arm64/include/asm/crash_core.h b/arch/arm64/include/asm/crash_reserve.h similarity index 81% rename from arch/arm64/include/asm/crash_core.h rename to arch/arm64/include/asm/crash_reserve.h index 9f5c8d339f44..4afe027a4e7b 100644 --- a/arch/arm64/include/asm/crash_core.h +++ b/arch/arm64/include/asm/crash_reserve.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ -#ifndef _ARM64_CRASH_CORE_H -#define _ARM64_CRASH_CORE_H +#ifndef _ARM64_CRASH_RESERVE_H +#define _ARM64_CRASH_RESERVE_H /* Current arm64 boot protocol requires 2MB alignment */ #define CRASH_ALIGN SZ_2M diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 414b978b8010..6aeab95f0edd 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -691,6 +691,7 @@ config FA_DUMP bool "Firmware-assisted dump" depends on PPC64 && (PPC_RTAS || PPC_POWERNV) select CRASH_CORE + select CRASH_RESERVE select CRASH_DUMP help A robust mechanism to get reliable kernel crash dump with diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c index b4f2786a7d2b..cdff129abb14 100644 --- a/arch/powerpc/mm/nohash/kaslr_booke.c +++ b/arch/powerpc/mm/nohash/kaslr_booke.c @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include #include @@ -173,7 +173,7 @@ static __init bool overlaps_region(const void *fdt, u32 start, static void __init get_crash_kernel(void *fdt, unsigned long size) { -#ifdef CONFIG_CRASH_CORE +#ifdef CONFIG_CRASH_RESERVE unsigned long long crash_size, crash_base; int ret; diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index b549499eb363..37a438c23deb 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -712,7 +712,7 @@ config ARCH_SUPPORTS_CRASH_DUMP def_bool y config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION - def_bool CRASH_CORE + def_bool CRASH_RESERVE config COMPAT bool "Kernel support for 32-bit U-mode" diff --git a/arch/riscv/include/asm/crash_core.h b/arch/riscv/include/asm/crash_reserve.h similarity index 78% rename from arch/riscv/include/asm/crash_core.h rename to arch/riscv/include/asm/crash_reserve.h index e1874b23feaf..013962e63587 100644 --- a/arch/riscv/include/asm/crash_core.h +++ b/arch/riscv/include/asm/crash_reserve.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ -#ifndef
Re: [PATCH v7 1/3] powerpc: make fadump resilient with memory add/remove events
Hello Hari, On 23/01/24 15:39, Hari Bathini wrote: On 11/01/24 7:39 pm, Sourabh Jain wrote: Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel prepares the fadump header and stores it in the fadump reserved area. The fadump header contains start address of the elfcorehd, crashing CPU details, etc. In the event of first kernel "elfcorehd" used instead of "elfcorehdr" at a couple of places.. Fixed it now. Thanks. crash, the second/fadump boots and access the fadump header prepared by first kernel and do the following in a platform-specific function [rtas|opal]_fadump_process: At present, the first kernel is responsible for preparing the fadump header and storing it in the fadump reserved area. The fadump header includes the start address of the elfcorehd, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr 3. Set the global variable elfcorehdr_addr to the address of the fadump header's elfcorehdr. For vmcore module to process it later on. Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr in second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. elfcorehdr_size: since elfcorehdr is now prepared in the fadump/second kernel and it is not part of the reserved area, this attribute is needed to track the memory allocated for elfcorehdr to do the deallocation properly. 3. pt_regs_sz/cpu_mask_sz: Store size of pt_regs and cpu_mask strucutre in first kernel. These attributes are used avoid
[PATCH v16 5/5] powerpc: add crash memory hotplug support
Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot remove, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the kexec tool. Passing this flag to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h| 5 +- arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/core_64.c| 105 +++- arch/powerpc/kexec/file_load_64.c | 34 +++- arch/powerpc/kexec/ranges.c | 85 +++ 5 files changed, 226 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 67bace4c90cf..a4193d922d99 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -116,8 +116,11 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges); #ifdef CONFIG_CRASH_HOTPLUG void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#endif /* CONFIG_CRASH_HOTPLUG */ +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size + +#endif /* CONFIG_CRASH_HOTPLUG */ #endif /* CONFIG_PPC64 */ #ifdef CONFIG_KEXEC_FILE diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index f83866a19e87..802abf580cf0 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int add_tce_mem_ranges(struct crash_mem **mem_ranges); int add_initrd_mem_range(struct crash_mem **mem_ranges); #ifdef CONFIG_PPC_64S_HASH_MMU diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index ff04cdc80f3d..6f188b5ef51e 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -19,8 +19,11 @@ #include #include #include +#include #include +#include +#include #include #include #include @@ -590,6 +593,103 @@ late_initcall(export_htab_values); #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt +/* + * Advertise preferred elfcorehdr size to userspace via + * /sys/kernel/crash_elfcorehdr_size sysfs interface. + */ +unsigned int arch_crash_get_elfcorehdr_size(void) +{ + unsigned int sz; + unsigned long elf_phdr_cnt; + + /* Program header for CPU notes and vmcoreinfo *
[PATCH v16 4/5] powerpc: add crash CPU hotplug support
Due to CPU/Memory hotplug or online/offline events, the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward, CPU hotplug or online/offline events are referred as CPU/Memory add/remove events. The current solution to address the above issue involves monitoring the CPU/Memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. To address the aforementioned issue, commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. In the event of a CPU or memory add/remove events, the generic crash hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It then acquires the necessary locks to update the kdump image and invokes the architecture-specific crash hotplug handler, `arch_crash_handle_hotplug_event()`, to update the required kdump image components. This patch adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU add/remove events. Support for memory add/remove events is added in a subsequent patch with the title "powerpc: add crash memory hotplug support" As mentioned earlier, only the elfcorehdr and FDT kdump image components need to be updated in the event of CPU or memory add/remove events. However, on PowerPC architecture crash hotplug handler only updates the FDT to enable crash hotplug support for CPU add/remove events. Here's why. The elfcorehdr on PowerPC is built with possible CPUs, and thus, it does not need an update on CPU add/remove events. On the other hand, the FDT needs to be updated on CPU add events to include the newly added CPU. If the FDT is not updated and the kernel crashes on a newly added CPU, the kdump kernel will fail to boot due to the unavailability of the crashing CPU in the FDT. During the early boot, it is expected that the boot CPU must be a part of the FDT; otherwise, the kernel will raise a BUG and fail to boot. For more information, refer to commit 36ae37e3436b0 ("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is okay to have an offline CPU in the kdump FDT, no action is taken in case of CPU removal. There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. Few changes have been made to ensure kernel can safely update the FDT of kdump image loaded using both system calls. For kexec_file_load syscall the kdump image is prepared in kernel. So to support an increasing number of CPUs, the FDT is constructed with extra buffer space to ensure it can accommodate a possible number of CPU nodes. Additionally, a call to fdt_pack (which trims the unused space once the FDT is prepared) is avoided if this feature is enabled. For the kexec_load syscall, the FDT is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by userspace (kexec tools). When userspace passes this flag to the kernel, it indicates that the FDT is built to accommodate possible CPUs, and the FDT segment is excluded from SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/Kconfig | 4 ++ arch/powerpc/include/asm/kexec.h | 6 ++ arch/powerpc/kexec/core_64.c | 93 +++ arch/powerpc/kexec/elf_64.c | 12 +++- arch/powerpc/kexec/file_load_64.c | 17 ++ 5 files changed, 131 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index b9fc064d38d2..fd1bf07244c6 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -682,6 +682,10 @@ config RELOCATABLE_TEST config ARCH_SUPPORTS_CRASH_DUMP def_bool PPC64 || PPC_BOOK3S_32 || PPC_8
[PATCH v16 3/5] powerpc/kexec: turn some static helper functions public
Move the functions update_cpus_node and get_crash_memory_ranges from kexec/file_load_64.c to kexec/core_64.c to make these functions usable by other kexec components. get_crash_memory_ranges uses functions defined in ranges.c, so take ranges.c out of CONFIG_KEXEC_FILE. Later in the series, these functions are utilized for in-kernel updates to kdump image during CPU/Memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. There is no intended functional change. Signed-off-by: Sourabh Jain Reviewed-by: Laurent Dufour Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h | 6 ++ arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/core_64.c | 166 ++ arch/powerpc/kexec/file_load_64.c | 162 - 4 files changed, 174 insertions(+), 164 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index e1b43aa12175..562e1bb689da 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -108,6 +108,12 @@ void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); #endif /* CONFIG_PPC_RTAS */ #endif /* CONFIG_CRASH_DUMP */ +#ifdef CONFIG_PPC64 +struct crash_mem; +int update_cpus_node(void *fdt); +int get_crash_memory_ranges(struct crash_mem **mem_ranges); +#endif /* CONFIG_PPC64 */ + #ifdef CONFIG_KEXEC_FILE extern const struct kexec_file_ops kexec_elf64_ops; diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile index 0c2abe7f9908..f2ed5b85b912 100644 --- a/arch/powerpc/kexec/Makefile +++ b/arch/powerpc/kexec/Makefile @@ -3,11 +3,11 @@ # Makefile for the linux kernel. # -obj-y += core.o crash.o core_$(BITS).o +obj-y += core.o crash.o ranges.o core_$(BITS).o obj-$(CONFIG_PPC32)+= relocate_32.o -obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o +obj-$(CONFIG_KEXEC_FILE) += file_load.o file_load_$(BITS).o elf_$(BITS).o # Disable GCOV, KCOV & sanitizers in odd or sensitive code GCOV_PROFILE_core_$(BITS).o := n diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 762e4d09aacf..48beaadcfb70 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include #include @@ -30,6 +32,8 @@ #include #include #include +#include +#include int machine_kexec_prepare(struct kimage *image) { @@ -376,6 +380,168 @@ void default_machine_kexec(struct kimage *image) /* NOTREACHED */ } +/** + * get_crash_memory_ranges - Get crash memory ranges. This list includes + * first/crashing kernel's memory regions that + * would be exported via an elfcore. + * @mem_ranges: Range list to add the memory ranges to. + * + * Returns 0 on success, negative errno on error. + */ +int get_crash_memory_ranges(struct crash_mem **mem_ranges) +{ + phys_addr_t base, end; + struct crash_mem *tmem; + u64 i; + int ret; + + for_each_mem_range(i, , ) { + u64 size = end - base; + + /* Skip backup memory region, which needs a separate entry */ + if (base == BACKUP_SRC_START) { + if (size > BACKUP_SRC_SIZE) { + base = BACKUP_SRC_END + 1; + size -= BACKUP_SRC_SIZE; + } else + continue; + } + + ret = add_mem_range(mem_ranges, base, size); + if (ret) + goto out; + + /* Try merging adjacent ranges before reallocation attempt */ + if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges) + sort_memory_ranges(*mem_ranges, true); + } + + /* Reallocate memory ranges if there is no space to split ranges */ + tmem = *mem_ranges; + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { + tmem = realloc_mem_ranges(mem_ranges); + if (!tmem) + goto out; + } + + /* Exclude crashkernel region */ + ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + /* +* FIXME: For now, stay in parity with k
[PATCH v16 2/5] crash: add a new kexec flag for hotplug support
Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to the kernel, it indicates to the kernel that all the required kexec segment is skipped from SHA calculation and it is safe to update kdump image loaded using the kexec_load syscall. While loading the kdump image using the kexec_load syscall, the @update_elfcorehdr member of struct kimage is set if the kexec tool sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used to determine whether it is safe to update elfcorehdr on hotplug events. However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag, the kexec tool could mark all the required kexec segments on an architecture as safe to update. So rename the @update_elfcorehdr to @hotplug_support. If @hotplug_support is set, the kernel can safely update all the required kexec segments of the kdump image during CPU/Memory hotplug events. Introduce an architecture-specific function to process kexec flags for determining hotplug support. Set the @hotplug_support member of struct kimage for both kexec_load and kexec_file_load system calls. This simplifies kernel checks to identify hotplug support for the currently loaded kdump image by just examining the value of @hotplug_support. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 11 ++- arch/x86/kernel/crash.c | 28 +--- drivers/base/cpu.c | 2 +- drivers/base/memory.c| 2 +- include/linux/kexec.h| 27 +-- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 11 --- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 9 files changed, 50 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 9bb6607e864e..8be622e82ba8 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -211,15 +211,8 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support -#endif - -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support -#endif +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 44744e9c68ec..7072aaee2ea0 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -398,20 +398,26 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) -{ - return crash_check_update_elfcorehdr(); -} +#ifdef CONFIG_KEXEC_FILE + if (image->file_mode) + return 1; #endif + /* +* Initially, crash hotplug support for kexec_load was added +* with the KEXEC_UPDATE_ELFCOREHDR flag. Later, this +* functionality was expanded to accommodate multiple kexec +* segment updates, lead
[PATCH v16 1/5] crash: forward memory_notify arg to arch crash hotplug handler
In the event of memory hotplug or online/offline events, the crash memory hotplug notifier `crash_memhp_notifier()` receives a `memory_notify` object but doesn't forward that object to the generic and architecture-specific crash hotplug handler. The `memory_notify` object contains the starting PFN (Page Frame Number) and the number of pages in the hot-removed memory. This information is necessary for architectures like PowerPC to update/recreate the kdump image, specifically `elfcorehdr`. So update the function signature of `crash_handle_hotplug_event()` and `arch_crash_handle_hotplug_event()` to accept the `memory_notify` object as an argument from crash memory hotplug notifier. Since no such object is available in the case of CPU hotplug event, the crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the crash hotplug handler. Signed-off-by: Sourabh Jain Acked-by: Baoquan He Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 2 +- arch/x86/kernel/crash.c | 3 ++- include/linux/kexec.h| 2 +- kernel/crash_core.c | 14 +++--- 4 files changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index c9f6a6c5de3c..9bb6607e864e 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -208,7 +208,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image); extern void kdump_nmi_shootdown_cpus(void); #ifdef CONFIG_CRASH_HOTPLUG -void arch_crash_handle_hotplug_event(struct kimage *image); +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index b6b044356f1b..44744e9c68ec 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -428,10 +428,11 @@ unsigned int arch_crash_get_elfcorehdr_size(void) /** * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes * @image: a pointer to kexec_crash_image + * @arg: struct memory_notify handler for memory hotplug case and NULL for CPU hotplug case. * * Prepare the new elfcorehdr and replace the existing elfcorehdr. */ -void arch_crash_handle_hotplug_event(struct kimage *image) +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { void *elfbuf = NULL, *old_elfcorehdr; unsigned long nr_mem_ranges; diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 400cb6c02176..802052d9c64b 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -483,7 +483,7 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { #endif #ifndef arch_crash_handle_hotplug_event -static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } +static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } #endif int crash_check_update_elfcorehdr(void); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 75cd6a736d03..b692ec5955de 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -924,7 +924,7 @@ int crash_check_update_elfcorehdr(void) * list of segments it checks (since the elfcorehdr changes and thus * would require an update to purgatory itself to update the digest). */ -static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) +static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg) { struct kimage *image; @@ -986,7 +986,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) image->hp_action = hp_action; /* Now invoke arch-specific update handler */ - arch_crash_handle_hotplug_event(image); + arch_crash_handle_hotplug_event(image, arg); /* No longer handling a hotplug event */ image->hp_action = KEXEC_CRASH_HP_NONE; @@ -1002,17 +1002,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) crash_hotplug_unlock(); } -static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v) +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg) { switch (val) { case MEM_ONLINE: crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, - KEXEC_CRASH_HP_INVALID_CPU); + KEXEC_CRASH_HP_INVALID_CPU, arg); break;
[PATCH v16 0/5] powerpc/crash: Kernel handling of CPU and memory hotplug
mit message for better understanding. v8: - Restrict fdt_index initialization to machine_kexec_post_load it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour - Updated the logic to find the number of offline core. [6/8] - Changed the logic to find the elfcore program header to accommodate future memory ranges due memory hotplug events. [8/8] v7 - added a new config to configure this feature - pass hotplug action type to arch specific handler v6 - Added crash memory hotplug support v5: - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU. - Move fdt segment identification for kexec_load case to load path instead of crash hotplug handler - Keep new attribute defined under kimage_arch to track FDT segment under CONFIG_HOTPLUG_CPU config. v4: - Update the logic to find the additional space needed for hotadd CPUs post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load" patch to know more about the change. - Fix a couple of typo. - Replace pr_err to pr_info_once to warn user about memory hotplug support. - In crash hotplug handle exit the for loop if FDT segment is found. v3 - Move fdt_index and fdt_index_vaild variables to kimage_arch struct. - Rebase patche on top of https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/ - Fixed warning reported by checpatch script v2: - Use generic hotplug handler introduced by https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/ a significant change from v1. Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org Sourabh Jain (5): crash: forward memory_notify arg to arch crash hotplug handler crash: add a new kexec flag for hotplug support powerpc/kexec: turn some static helper functions public powerpc: add crash CPU hotplug support powerpc: add crash memory hotplug support arch/powerpc/Kconfig| 4 + arch/powerpc/include/asm/kexec.h| 15 + arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/core_64.c| 362 arch/powerpc/kexec/elf_64.c | 12 +- arch/powerpc/kexec/file_load_64.c | 211 -- arch/powerpc/kexec/ranges.c | 85 ++ arch/x86/include/asm/kexec.h| 13 +- arch/x86/kernel/crash.c | 31 +- drivers/base/cpu.c | 2 +- drivers/base/memory.c | 2 +- include/linux/kexec.h | 29 +- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 25 +- kernel/kexec.c | 4 +- kernel/kexec_file.c | 5 + 17 files changed, 589 insertions(+), 217 deletions(-) -- 2.43.0
[PATCH v8 3/3] Documentation/powerpc: update fadump implementation details
The patch titled ("powerpc: make fadump resilient with memory add/remove events") has made significant changes to the implementation of fadump, particularly on elfcorehdr creation and fadump crash info header structure. Therefore, updating the fadump implementation documentation to reflect those changes. Following updates are done to firmware assisted dump documentation: 1. The elfcorehdr is no longer stored after fadump HDR in the reserved dump area. Instead, the second kernel dynamically allocates memory for the elfcorehdr within the address range from 0 to the boot memory size. Therefore, update figures 1 and 2 of Memory Reservation during the first and second kernels to reflect this change. 2. A version field has been added to the fadump header to manage the future changes to fadump crash info header structure without changing the fadump header magic number in the future. Therefore, remove the corresponding TODO from the document. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- .../arch/powerpc/firmware-assisted-dump.rst | 91 +-- 1 file changed, 42 insertions(+), 49 deletions(-) diff --git a/Documentation/arch/powerpc/firmware-assisted-dump.rst b/Documentation/arch/powerpc/firmware-assisted-dump.rst index e363fc48529a..7e37aadd1f77 100644 --- a/Documentation/arch/powerpc/firmware-assisted-dump.rst +++ b/Documentation/arch/powerpc/firmware-assisted-dump.rst @@ -134,12 +134,12 @@ that are run. If there is dump data, then the memory is held. If there is no waiting dump data, then only the memory required to -hold CPU state, HPTE region, boot memory dump, FADump header and -elfcore header, is usually reserved at an offset greater than boot -memory size (see Fig. 1). This area is *not* released: this region -will be kept permanently reserved, so that it can act as a receptacle -for a copy of the boot memory content in addition to CPU state and -HPTE region, in the case a crash does occur. +hold CPU state, HPTE region, boot memory dump, and FADump header is +usually reserved at an offset greater than boot memory size (see Fig. 1). +This area is *not* released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy of the boot +memory content in addition to CPU state and HPTE region, in the case +a crash does occur. Since this reserved memory area is used only after the system crash, there is no point in blocking this significant chunk of memory from @@ -153,22 +153,22 @@ that were present in CMA region:: o Memory Reservation during first kernel - Low memory Top of memory - 0boot memory size |<--- Reserved dump area --->| | - | | |Permanent Reservation | | - V V || V - +---+-/ /---+---++---+-+-++--+ - | | |///|| DUMP | HDR | ELF || | - +---+-/ /---+---++---+-+-++--+ -| ^^ ^ ^ ^ -| || | | | -\ CPU HPTE / | | - -- | | - Boot memory content gets transferred| | - to reserved area by firmware at the | | - time of crash. | | - FADump Header | - (meta area)| + Low memory Top of memory + 0boot memory size |<-- Reserved dump area ->| | + | | | Permanent Reservation | | + V V | | V + +---+-/ /---+---++---+---++-+ + | | |///||DUMP | HDR || | + +---+-/ /---+---++---+---++-+ +| ^^ ^ ^ ^ +| || | | | +\ CPU HPTE / | | + | | + Boot memory content gets transferred | | + to reserved area by firmware at the | | + time of crash. | | + FADump Header | +(meta area) | | | Metadata: This area holds a metadata structure whose @@ -186,13 +186,2
[PATCH v8 2/3] powerpc/fadump: add hotplug_ready sysfs interface
The elfcorehdr describes the CPUs and memory of the crashed kernel to the kernel that captures the dump, known as the second or fadump kernel. The elfcorehdr needs to be updated if the system's memory changes due to memory hotplug or online/offline events. Currently, memory hotplug events are monitored in userspace by udev rules, and fadump is re-registered, which recreates the elfcorehdr with the latest available memory in the system. However, the previous patch ("powerpc: make fadump resilient with memory add/remove events") moved the creation of elfcorehdr to the second or fadump kernel. This eliminates the need to regenerate the elfcorehdr during memory hotplug or online/offline events. Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let userspace know that fadump re-registration is not required for memory add/remove events. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- Documentation/ABI/testing/sysfs-kernel-fadump | 11 +++ arch/powerpc/kernel/fadump.c | 14 ++ 2 files changed, 25 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump b/Documentation/ABI/testing/sysfs-kernel-fadump index 8f7a64a81783..e2786a3db8dd 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump +++ b/Documentation/ABI/testing/sysfs-kernel-fadump @@ -38,3 +38,14 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Provide information about the amount of memory reserved by FADump to save the crash dump in bytes. + +What: /sys/kernel/fadump/hotplug_ready +Date: Feb 2024 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read only + Kdump udev rule re-registers fadump on memory add/remove events, + primarily to update the elfcorehdr. This sysfs indicates the + kdump udev rule that fadump re-registration is not required on + memory add/remove events because elfcorehdr is now prepared in + the second/fadump kernel. +User: kexec-tools diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index a55ad8514745..6478820a2038 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1444,6 +1444,18 @@ static ssize_t enabled_show(struct kobject *kobj, return sprintf(buf, "%d\n", fw_dump.fadump_enabled); } +/* + * /sys/kernel/fadump/hotplug_ready sysfs node returns 1, which inidcates + * to usersapce that fadump re-registration is not required on memory + * hotplug events. + */ +static ssize_t hotplug_ready_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", 1); +} + static ssize_t mem_reserved_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -1516,11 +1528,13 @@ static struct kobj_attribute release_attr = __ATTR_WO(release_mem); static struct kobj_attribute enable_attr = __ATTR_RO(enabled); static struct kobj_attribute register_attr = __ATTR_RW(registered); static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved); +static struct kobj_attribute hotplug_ready_attr = __ATTR_RO(hotplug_ready); static struct attribute *fadump_attrs[] = { _attr.attr, _attr.attr, _reserved_attr.attr, + _ready_attr.attr, NULL, }; -- 2.43.0
[PATCH v8 1/3] powerpc: make fadump resilient with memory add/remove events
Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel is responsible for preparing the fadump header and storing it in the fadump reserved area. The fadump header includes the start address of the elfcorehdr, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr and set its address to the global variable elfcorehdr_addr for the vmcore module to process it in the second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. pt_regs_sz/cpu_mask_sz: Store size of pt_regs and cpu_mask structure of first kernel. These attributes are used to prevent dump processing if the sizes of pt_regs or cpu_mask structure differ between the first and fadump kernels. Note: if either first/crashed kernel or second/fadump kernel do not have the changes introduced here then kernel fail to collect the dump and prints relevant error message on the console. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- arch/powerpc/include/asm/fadump-internal.h | 31 +- arch/powerpc/kernel/fadump.c | 339 +++ arch/powerpc/platforms/powernv/opal-fadump.c | 22 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 30 +- 4 files changed, 232 insertions(+), 190 deletions(-) diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index 27f9e11eda28..5d706a7acc8a 100644
[PATCH v8 0/3] powerpc: make fadump resilient with memory add/remove events
d by the checkpatch script. - Rebased it to 6.6.0-rc3 v1: 17 Sep 2023 https://lore.kernel.org/all/20230917080225.561627-1-sourabhj...@linux.ibm.com/ Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao Sourabh Jain (3): powerpc: make fadump resilient with memory add/remove events powerpc/fadump: add hotplug_ready sysfs interface Documentation/powerpc: update fadump implementation details Documentation/ABI/testing/sysfs-kernel-fadump | 11 + .../arch/powerpc/firmware-assisted-dump.rst | 91 +++-- arch/powerpc/include/asm/fadump-internal.h| 31 +- arch/powerpc/kernel/fadump.c | 353 +++--- arch/powerpc/platforms/powernv/opal-fadump.c | 22 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 30 +- 6 files changed, 299 insertions(+), 239 deletions(-) -- 2.43.0
Re: [PATCH v15 2/5] crash: add a new kexec flag for hotplug support
On 13/02/24 08:51, Baoquan He wrote: On 02/12/24 at 07:27pm, Sourabh Jain wrote: Hello Baoquan, On 05/02/24 08:40, Baoquan He wrote: Hi Sourabh, .. diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 802052d9c64b..7880d74dc5c4 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -317,8 +317,8 @@ struct kimage { /* If set, we are using file mode kexec syscall */ unsigned int file_mode:1; #ifdef CONFIG_CRASH_HOTPLUG - /* If set, allow changes to elfcorehdr of kexec_load'd image */ - unsigned int update_elfcorehdr:1; + /* If set, allow changes to kexec segments of kexec_load'd image */ The code comment doesn't reflect the usage of the flag. I should have updated the comment to indicate that this flag is for both system calls. More comments below. You set it too when it's kexec_file_load. Speaking of this, I do wonder why you need set it too for kexec_file_load, If we do this one can just access image->hotplug_support to find hotplug support for currently loaded kdump image without bothering about which system call was used to load the kdump image. and why we have arch_crash_hotplug_support(), then crash_check_hotplug_support() both of which have the same effect. arch_crash_hotplug_support(): This function processes the kexec flags and finds the hotplug support for the kdump image. Based on the return value of this function, the image->hotplug_support attribute is set. Now, once the kdump image is loaded, we no longer have access to the kexec flags. Therefore, crash_check_hotplug_support simply returns the value of image->hotplug_support when user space accesses the following sysfs files: /sys/devices/system/[cpu|memory]/crash_hotplug. To keep things simple, I have introduced two functions: One function processes the kexec flags and determines the hotplug support for the image being loaded. And other function simply accesses image->hotplug_support and advertises CPU/Memory hotplug support to userspace. From the function name and their functionality, they seems to be duplicated, even though it's different from the internal detail. This could bring a little confusion to code understanding. It's fine, we can refactor them if needed in the future. So let's keep it as the patch is. Thanks. Ok sure. - Sourabh
Re: [PATCH v15 2/5] crash: add a new kexec flag for hotplug support
Hello Baoquan, On 05/02/24 08:40, Baoquan He wrote: Hi Sourabh, Thanks for the great work. There are some concerns, please see inline comments. Thank you :) On 01/11/24 at 04:21pm, Sourabh Jain wrote: .. Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to the kernel, it indicates to the kernel that all the required kexec segment is skipped from SHA calculation and it is safe to update kdump image loaded using the kexec_load syscall. So finally you add a new KEXEC_CRASH_HOTPLUG_SUPPORT flag, that's fine. .. diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 9bb6607e864e..e791129fdf6c 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -211,6 +211,9 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support + #ifdef CONFIG_HOTPLUG_CPU int arch_crash_hotplug_cpu_support(void); #define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support Then crash_hotplug_cpu_support is not needed any more on x86_64, and crash_hotplug_memory_support(), if you remove their implementation in arch/x86/kernel/crash.c, won't it cause building warning or error on x86? Yeah, crash_hotplug_cpu_support and crash_hotplug_memory_support are no longer required. My bad, I forgot to remove them. diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 44744e9c68ec..293b54bff706 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -398,20 +398,16 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) -{ - return crash_check_update_elfcorehdr(); -} +#ifdef CONFIG_KEXEC_FILE + if (image->file_mode) + return 1; #endif + return (kexec_flags & KEXEC_UPDATE_ELFCOREHDR || + kexec_flags & KEXEC_CRASH_HOTPLUG_SUPPORT); Do we need add some document to tell why there are two kexec flags on x86_64, except of checking this patch log? Sure I will add a comment about it. +} unsigned int arch_crash_get_elfcorehdr_size(void) { diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 548491de818e..2f411ddfbd8b 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -306,7 +306,7 @@ static ssize_t crash_hotplug_show(struct device *dev, struct device_attribute *attr, char *buf) { - return sysfs_emit(buf, "%d\n", crash_hotplug_cpu_support()); + return sysfs_emit(buf, "%d\n", crash_check_hotplug_support()); } static DEVICE_ATTR_ADMIN_RO(crash_hotplug); #endif diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 8a13babd826c..e70ab1d3428e 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -514,7 +514,7 @@ static DEVICE_ATTR_RW(auto_online_blocks); static ssize_t crash_hotplug_show(struct device *dev, struct device_attribute *attr, char *buf) { - return sysfs_emit(buf, "%d\n", crash_hotplug_memory_support()); + return sysfs_emit(buf, "%d\n", crash_check_hotplug_support()); } static DEVICE_ATTR_RO(crash_hotplug); #endif diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 802052d9c64b..7880d74dc5c4 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -317,8 +317,8 @@ struct kimage { /* If set, we are using file mode kexec syscall */ unsigned int file_mode:1; #ifdef CONFIG_CRASH_HOTPLUG - /* If set, allow changes to elfcorehdr of kexec_load'd image */ - unsigned int update_elfcorehdr:1; + /* If set, allow changes to kexec segments of kexec_load'd image */ The code comment doesn't reflect the usage of the flag. I should have updated the comment to indicate that this flag is for both system calls. More comments below. You set it too when it's kexec_file_load. Speaking of this, I do wonder why you need set it too for kexec_file_load, If we do this one can just access image->hotplug_support to find hotplug support for currently loaded kdump image without bothering about which system call was used to load the kdump image. and why we have arch_crash_hotplug_support(), then crash_check_hotplug_support() both of which have the same effect. arch_crash_hotpl
Re: [PATCH v15 1/5] crash: forward memory_notify arg to arch crash hotplug handler
On 05/02/24 08:41, Baoquan He wrote: On 01/11/24 at 04:21pm, Sourabh Jain wrote: In the event of memory hotplug or online/offline events, the crash memory hotplug notifier `crash_memhp_notifier()` receives a `memory_notify` object but doesn't forward that object to the generic and architecture-specific crash hotplug handler. The `memory_notify` object contains the starting PFN (Page Frame Number) and the number of pages in the hot-removed memory. This information is necessary for architectures like PowerPC to update/recreate the kdump image, specifically `elfcorehdr`. So update the function signature of `crash_handle_hotplug_event()` and `arch_crash_handle_hotplug_event()` to accept the `memory_notify` object as an argument from crash memory hotplug notifier. Since no such object is available in the case of CPU hotplug event, the crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the crash hotplug handler. .. --- arch/x86/include/asm/kexec.h | 2 +- arch/x86/kernel/crash.c | 3 ++- include/linux/kexec.h| 2 +- kernel/crash_core.c | 14 +++--- 4 files changed, 11 insertions(+), 10 deletions(-) LGTM, Acked-by: Baoquan He Thanks Baoquan He - Sourabh diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index c9f6a6c5de3c..9bb6607e864e 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -208,7 +208,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image); extern void kdump_nmi_shootdown_cpus(void); #ifdef CONFIG_CRASH_HOTPLUG -void arch_crash_handle_hotplug_event(struct kimage *image); +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index b6b044356f1b..44744e9c68ec 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -428,10 +428,11 @@ unsigned int arch_crash_get_elfcorehdr_size(void) /** * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes * @image: a pointer to kexec_crash_image + * @arg: struct memory_notify handler for memory hotplug case and NULL for CPU hotplug case. * * Prepare the new elfcorehdr and replace the existing elfcorehdr. */ -void arch_crash_handle_hotplug_event(struct kimage *image) +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { void *elfbuf = NULL, *old_elfcorehdr; unsigned long nr_mem_ranges; diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 400cb6c02176..802052d9c64b 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -483,7 +483,7 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { #endif #ifndef arch_crash_handle_hotplug_event -static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } +static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } #endif int crash_check_update_elfcorehdr(void); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index d48315667752..ab1c8e79759d 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -914,7 +914,7 @@ int crash_check_update_elfcorehdr(void) * list of segments it checks (since the elfcorehdr changes and thus * would require an update to purgatory itself to update the digest). */ -static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) +static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg) { struct kimage *image; @@ -976,7 +976,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) image->hp_action = hp_action; /* Now invoke arch-specific update handler */ - arch_crash_handle_hotplug_event(image); + arch_crash_handle_hotplug_event(image, arg); /* No longer handling a hotplug event */ image->hp_action = KEXEC_CRASH_HP_NONE; @@ -992,17 +992,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) crash_hotplug_unlock(); } -static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v) +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg) { switch (val) { case MEM_ONLINE: crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, - KEXEC_CRASH_HP_INVALID_CPU); + KEXEC_CRASH_HP_INVALID_CPU, arg); break; case MEM_OFFLINE: crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY, - KEXEC_CRASH_HP_INVALID_CPU); + KEXEC_CRASH_HP_INVALID_CPU, arg); break; } return NOTIFY_OK; @@ -1015,13 +1015,13 @@ static struct notifier_block crash_me
Re: [PATCH v15 5/5] powerpc: add crash memory hotplug support
On 23/01/24 15:52, Hari Bathini wrote: On 11/01/24 4:21 pm, Sourabh Jain wrote: Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot removal, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the kexec tool. Passing this flag to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h | 5 +- arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/core_64.c | 107 +++- arch/powerpc/kexec/file_load_64.c | 34 +++- arch/powerpc/kexec/ranges.c | 85 +++ 5 files changed, 225 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 943e58eb9bff..25ff5b7f1a28 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -116,8 +116,11 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges); #ifdef CONFIG_CRASH_HOTPLUG void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#endif /*CONFIG_CRASH_HOTPLUG */ +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size + +#endif /*CONFIG_CRASH_HOTPLUG */ #endif /* CONFIG_PPC64 */ #ifdef CONFIG_KEXEC_FILE diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index f83866a19e87..802abf580cf0 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int add_tce_mem_ranges(struct crash_mem **mem_ranges); int add_initrd_mem_range(struct crash_mem **mem_ranges); #ifdef CONFIG_PPC_64S_HASH_MMU diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 43fcd78c2102..4673f150f973 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -19,8 +19,11 @@ #include #include #include +#include #include +#include +#include #include #include #include @@ -546,6 +549,101 @@ int update_cpus_node(void *fdt) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt +/* + * Advertise preferred elfcorehdr size to userspace via + * /sys/kernel/crash_elfcorehdr_size sysfs interface. + */ +unsigned int arch_crash_get_elfcorehdr_size(void) +{ + unsi
[PATCH v7 3/3] Documentation/powerpc: update fadump implementation details
The patch titled ("powerpc: make fadump resilient with memory add/remove events") has made significant changes to the implementation of fadump, particularly on elfcorehdr creation and fadump crash info header structure. Therefore, updating the fadump implementation documentation to reflect those changes. Following updates are done to firmware assisted dump documentation: 1. The elfcorehdr is no longer stored after fadump HDR in the reserved dump area. Instead, the second kernel dynamically allocates memory for the elfcorehdr within the address range from 0 to the boot memory size. Therefore, update figures 1 and 2 of Memory Reservation during the first and second kernels to reflect this change. 2. A version field has been added to the fadump header to manage the future changes to fadump crash info header structure without changing the fadump header magic number in the future. Therefore, remove the corresponding TODO from the document. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- .../arch/powerpc/firmware-assisted-dump.rst | 91 +-- 1 file changed, 42 insertions(+), 49 deletions(-) diff --git a/Documentation/arch/powerpc/firmware-assisted-dump.rst b/Documentation/arch/powerpc/firmware-assisted-dump.rst index e363fc48529a..7e37aadd1f77 100644 --- a/Documentation/arch/powerpc/firmware-assisted-dump.rst +++ b/Documentation/arch/powerpc/firmware-assisted-dump.rst @@ -134,12 +134,12 @@ that are run. If there is dump data, then the memory is held. If there is no waiting dump data, then only the memory required to -hold CPU state, HPTE region, boot memory dump, FADump header and -elfcore header, is usually reserved at an offset greater than boot -memory size (see Fig. 1). This area is *not* released: this region -will be kept permanently reserved, so that it can act as a receptacle -for a copy of the boot memory content in addition to CPU state and -HPTE region, in the case a crash does occur. +hold CPU state, HPTE region, boot memory dump, and FADump header is +usually reserved at an offset greater than boot memory size (see Fig. 1). +This area is *not* released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy of the boot +memory content in addition to CPU state and HPTE region, in the case +a crash does occur. Since this reserved memory area is used only after the system crash, there is no point in blocking this significant chunk of memory from @@ -153,22 +153,22 @@ that were present in CMA region:: o Memory Reservation during first kernel - Low memory Top of memory - 0boot memory size |<--- Reserved dump area --->| | - | | |Permanent Reservation | | - V V || V - +---+-/ /---+---++---+-+-++--+ - | | |///|| DUMP | HDR | ELF || | - +---+-/ /---+---++---+-+-++--+ -| ^^ ^ ^ ^ -| || | | | -\ CPU HPTE / | | - -- | | - Boot memory content gets transferred| | - to reserved area by firmware at the | | - time of crash. | | - FADump Header | - (meta area)| + Low memory Top of memory + 0boot memory size |<-- Reserved dump area ->| | + | | | Permanent Reservation | | + V V | | V + +---+-/ /---+---++---+---++-+ + | | |///||DUMP | HDR || | + +---+-/ /---+---++---+---++-+ +| ^^ ^ ^ ^ +| || | | | +\ CPU HPTE / | | + | | + Boot memory content gets transferred | | + to reserved area by firmware at the | | + time of crash. | | + FADump Header | +(meta area) | | | Metadata: This area holds a metadata structure whose @@ -186,13 +186,2
[PATCH v7 2/3] powerpc/fadump: add hotplug_ready sysfs interface
The elfcorehdr describes the CPUs and memory of the crashed kernel to the kernel that captures the dump, known as the second or fadump kernel. The elfcorehdr needs to be updated if the system's memory changes due to memory hotplug or online/offline events. Currently, memory hotplug events are monitored in userspace by udev rules, and fadump is re-registered, which recreates the elfcorehdr with the latest available memory in the system. However, the previous patch ("powerpc: make fadump resilient with memory add/remove events") moved the creation of elfcorehdr to the second or fadump kernel. This eliminates the need to regenerate the elfcorehdr during memory hotplug or online/offline events. Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let userspace know that fadump re-registration is not required for memory add/remove events. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- Documentation/ABI/testing/sysfs-kernel-fadump | 11 +++ arch/powerpc/kernel/fadump.c | 14 ++ 2 files changed, 25 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump b/Documentation/ABI/testing/sysfs-kernel-fadump index 8f7a64a81783..8e18a6c93650 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump +++ b/Documentation/ABI/testing/sysfs-kernel-fadump @@ -38,3 +38,14 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Provide information about the amount of memory reserved by FADump to save the crash dump in bytes. + +What: /sys/kernel/fadump/hotplug_ready +Date: Jan 2024 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read only + Kdump udev rule re-registers fadump on memory add/remove events, + primarily to update the elfcorehdr. This sysfs indicates the + kdump udev rule that fadump re-registration is not required on + memory add/remove events because elfcorehdr is now prepared in + the second/fadump kernel. +User: kexec-tools diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index eb9132538268..a55dd9bf754c 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1455,6 +1455,18 @@ static ssize_t enabled_show(struct kobject *kobj, return sprintf(buf, "%d\n", fw_dump.fadump_enabled); } +/* + * /sys/kernel/fadump/hotplug_ready sysfs node returns 1, which inidcates + * to usersapce that fadump re-registration is not required on memory + * hotplug events. + */ +static ssize_t hotplug_ready_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", 1); +} + static ssize_t mem_reserved_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -1527,11 +1539,13 @@ static struct kobj_attribute release_attr = __ATTR_WO(release_mem); static struct kobj_attribute enable_attr = __ATTR_RO(enabled); static struct kobj_attribute register_attr = __ATTR_RW(registered); static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved); +static struct kobj_attribute hotplug_ready_attr = __ATTR_RO(hotplug_ready); static struct attribute *fadump_attrs[] = { _attr.attr, _attr.attr, _reserved_attr.attr, + _ready_attr.attr, NULL, }; -- 2.41.0
[PATCH v7 1/3] powerpc: make fadump resilient with memory add/remove events
to collect the dump and prints relevant error message on the console. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- arch/powerpc/include/asm/fadump-internal.h | 31 +- arch/powerpc/kernel/fadump.c | 355 +++ arch/powerpc/platforms/powernv/opal-fadump.c | 18 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 23 +- 4 files changed, 242 insertions(+), 185 deletions(-) diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index 27f9e11eda28..a632e9708610 100644 --- a/arch/powerpc/include/asm/fadump-internal.h +++ b/arch/powerpc/include/asm/fadump-internal.h @@ -42,13 +42,40 @@ static inline u64 fadump_str_to_u64(const char *str) #define FADUMP_CPU_UNKNOWN (~((u32)0)) -#define FADUMP_CRASH_INFO_MAGICfadump_str_to_u64("FADMPINF") +/* + * The introduction of new fields in the fadump crash info header has + * led to a change in the magic key from `FADMPINF` to `FADMPSIG` for + * identifying a kernel crash from an old kernel. + * + * To prevent the need for further changes to the magic number in the + * event of future modifications to the fadump crash info header, a + * version field has been introduced to track the fadump crash info + * header version. + * + * Consider a few points before adding new members to the fadump crash info + * header structure: + * + * - Append new members; avoid adding them in between. + * - Non-primitive members should have a size member as well. + * - For every change in the fadump header, increment the + *fadump header version. This helps the updated kernel decide how to + *handle kernel dumps from older kernels. + */ +#define FADUMP_CRASH_INFO_MAGIC_OLDfadump_str_to_u64("FADMPINF") +#define FADUMP_CRASH_INFO_MAGICfadump_str_to_u64("FADMPSIG") +#define FADUMP_HEADER_VERSION 1 /* fadump crash info structure */ struct fadump_crash_info_header { u64 magic_number; - u64 elfcorehdr_addr; + u32 version; u32 crashing_cpu; + u64 elfcorehdr_addr; + u64 elfcorehdr_size; + u64 vmcoreinfo_raddr; + u64 vmcoreinfo_size; + u32 pt_regs_sz; + u32 cpu_mask_sz; struct pt_regs regs; struct cpumask cpu_mask; }; diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index d14eda1e8589..eb9132538268 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -53,8 +53,6 @@ static struct kobject *fadump_kobj; static atomic_t cpus_in_fadump; static DEFINE_MUTEX(fadump_mutex); -static struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0, false }; - #define RESERVED_RNGS_SZ 16384 /* 16K - 128 entries */ #define RESERVED_RNGS_CNT (RESERVED_RNGS_SZ / \ sizeof(struct fadump_memory_range)) @@ -373,12 +371,6 @@ static unsigned long __init get_fadump_area_size(void) size = PAGE_ALIGN(size); size += fw_dump.boot_memory_size; size += sizeof(struct fadump_crash_info_header); - size += sizeof(struct elfhdr); /* ELF core header.*/ - size += sizeof(struct elf_phdr); /* place holder for cpu notes */ - /* Program headers for crash memory regions. */ - size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2); - - size = PAGE_ALIGN(size); /* This is to hold kernel metadata on platforms that support it */ size += (fw_dump.ops->fadump_get_metadata_size ? @@ -931,36 +923,6 @@ static inline int fadump_add_mem_range(struct fadump_mrange_info *mrange_info, return 0; } -static int fadump_exclude_reserved_area(u64 start, u64 end) -{ - u64 ra_start, ra_end; - int ret = 0; - - ra_start = fw_dump.reserve_dump_area_start; - ra_end = ra_start + fw_dump.reserve_dump_area_size; - - if ((ra_start < end) && (ra_end > start)) { - if ((start < ra_start) && (end > ra_end)) { - ret = fadump_add_mem_range(_mrange_info, - start, ra_start); - if (ret) - return ret; - - ret = fadump_add_mem_range(_mrange_info, - ra_end, end); - } else if (start < ra_start) { - ret = fadump_add_mem_range(_mrange_info, - start, ra_start); - } else if (ra_end < end) { - ret = fadump_add_mem_range(_mrange_info, - ra_end, end)
[PATCH v7 0/3] powerpc: make fadump resilient with memory add/remove events
Problem: Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the cpus and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the patch series. Existing solution: == Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. Challenges with existing solution: == 1. Performing bulk memory add/remove with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a large wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. Memory for elfcorehdr is allocated based on the memblock regions available during first kernel early boot and it remains fixed thereafter. However, if the elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Proposed solution: == Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. To know more about elfcorehdr creation in the fadump kernel, refer to the first patch in this series. The second patch includes a new sysfs interface that tells userspace that fadump re-registration isn't needed for memory add/remove events. note that userspace changes do not need to be in sync with kernel changes; they can roll out independently. Since there are significant changes in the fadump implementation, the third patch updates the fadump documentation to reflect the changes made in this patch series. Kernel tree rebased on 6.7.0-rc4 with patch series applied: = https://github.com/sourabhjains/linux/tree/fadump-mem-hotplug-v7 Userspace changes: == To realize this feature, one must update the kdump udev rules to prevent fadump re-registration during memory add/remove events. On rhel apply the following changes to file /usr/lib/udev/rules.d/98-kexec.rules -run+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; /usr/bin/systemd-run --quiet --no-block /usr/lib/udev/kdump-udev-throttler'" +# don't re-register fadump if the value of the node +# /sys/kernel/fadump/hotplug_ready is 1. + +run+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; ! test -f /sys/kernel/fadump_enabled || cat /sys/kernel/fadump_enabled | grep 0 || ! test -f /sys/kernel/fadump/hotplug_ready || cat /sys/kernel/fadump/hotplug_ready | grep 0 || exit 0; /usr/bin/systemd-run --quiet --no-block /usr/lib/udev/kdump-udev-throttler'" Changelog: == v7: 11 Jan 2023 - Rebase it to 6.7 v6: 8 Dec 2023 https://lore.kernel.org/all/20231208115159.82236-1-sourabhj...@linux.ibm.com/ - Add size fields for `pt_regs` and `cpumask` in the fadump header structure - Don't process the dump if the size of `pt_regs` and `cpu_mask` is not same in the crashed and fadump kernel - Include an additional check for endianness mismatch when the magic number doesn't match, to print the relevant error message - Don't process the dump if the fadump header contains an old magic number - Rebased it to 6.7.0-rc4 v5: 29 Oct 2023 https://lore.kernel.org/all/20231029124548.12198-1-sourabhj...@linux.ibm.com/ - Fix a comment on the first patch v4: 21 Oct 2023 https://lore.kernel.org/all/20231021181733.204311-1-sourabhj...@linux.ibm.com/ - Fix a build warning about type casting v3: 9 Oct 2023 https://lore.kernel.org/all/20231009041953.36139-1-sourabhj...@linux.ibm.com/ - Assign physical address of elfcorehdr to fdh->elfcorehdr_addr - Rename a variable, boot_mem_dest_addr -> boot_mem_dest_offset v2: 25 Sep 2023 https://lore.kernel.org/all/20230925051214.678957-1-sourabhj...@linux.ibm.com/ - Fixed a few indentation issues reported by the checkpatch script. - Rebased it to 6.6.0-rc3 v1: 17 Sep 2023 https://lore.kernel.org/all/20230917080225.561627-1-sourabhj...@linux.ibm.com/ Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao Sourabh Jain (3): powe
[PATCH v15 1/5] crash: forward memory_notify arg to arch crash hotplug handler
In the event of memory hotplug or online/offline events, the crash memory hotplug notifier `crash_memhp_notifier()` receives a `memory_notify` object but doesn't forward that object to the generic and architecture-specific crash hotplug handler. The `memory_notify` object contains the starting PFN (Page Frame Number) and the number of pages in the hot-removed memory. This information is necessary for architectures like PowerPC to update/recreate the kdump image, specifically `elfcorehdr`. So update the function signature of `crash_handle_hotplug_event()` and `arch_crash_handle_hotplug_event()` to accept the `memory_notify` object as an argument from crash memory hotplug notifier. Since no such object is available in the case of CPU hotplug event, the crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the crash hotplug handler. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 2 +- arch/x86/kernel/crash.c | 3 ++- include/linux/kexec.h| 2 +- kernel/crash_core.c | 14 +++--- 4 files changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index c9f6a6c5de3c..9bb6607e864e 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -208,7 +208,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image); extern void kdump_nmi_shootdown_cpus(void); #ifdef CONFIG_CRASH_HOTPLUG -void arch_crash_handle_hotplug_event(struct kimage *image); +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index b6b044356f1b..44744e9c68ec 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -428,10 +428,11 @@ unsigned int arch_crash_get_elfcorehdr_size(void) /** * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes * @image: a pointer to kexec_crash_image + * @arg: struct memory_notify handler for memory hotplug case and NULL for CPU hotplug case. * * Prepare the new elfcorehdr and replace the existing elfcorehdr. */ -void arch_crash_handle_hotplug_event(struct kimage *image) +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { void *elfbuf = NULL, *old_elfcorehdr; unsigned long nr_mem_ranges; diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 400cb6c02176..802052d9c64b 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -483,7 +483,7 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { #endif #ifndef arch_crash_handle_hotplug_event -static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } +static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } #endif int crash_check_update_elfcorehdr(void); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index d48315667752..ab1c8e79759d 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -914,7 +914,7 @@ int crash_check_update_elfcorehdr(void) * list of segments it checks (since the elfcorehdr changes and thus * would require an update to purgatory itself to update the digest). */ -static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) +static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg) { struct kimage *image; @@ -976,7 +976,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) image->hp_action = hp_action; /* Now invoke arch-specific update handler */ - arch_crash_handle_hotplug_event(image); + arch_crash_handle_hotplug_event(image, arg); /* No longer handling a hotplug event */ image->hp_action = KEXEC_CRASH_HP_NONE; @@ -992,17 +992,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) crash_hotplug_unlock(); } -static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v) +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg) { switch (val) { case MEM_ONLINE: crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, - KEXEC_CRASH_HP_INVALID_CPU); + KEXEC_CRASH_HP_INVALID_CPU, arg); break; case MEM_O
[PATCH v15 0/5] powerpc/crash: Kernel handling of CPU and memory hotplug
dded a new config to configure this feature - pass hotplug action type to arch specific handler v6 - Added crash memory hotplug support v5: - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU. - Move fdt segment identification for kexec_load case to load path instead of crash hotplug handler - Keep new attribute defined under kimage_arch to track FDT segment under CONFIG_HOTPLUG_CPU config. v4: - Update the logic to find the additional space needed for hotadd CPUs post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load" patch to know more about the change. - Fix a couple of typo. - Replace pr_err to pr_info_once to warn user about memory hotplug support. - In crash hotplug handle exit the for loop if FDT segment is found. v3 - Move fdt_index and fdt_index_vaild variables to kimage_arch struct. - Rebase patche on top of https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/ - Fixed warning reported by checpatch script v2: - Use generic hotplug handler introduced by https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/ a significant change from v1. Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org Sourabh Jain (5): crash: forward memory_notify arg to arch crash hotplug handler crash: add a new kexec flag for hotplug support powerpc/kexec: turn some static helper functions public powerpc: add crash CPU hotplug support powerpc: add crash memory hotplug support arch/powerpc/Kconfig| 4 + arch/powerpc/include/asm/kexec.h| 15 ++ arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/core_64.c| 334 arch/powerpc/kexec/elf_64.c | 12 +- arch/powerpc/kexec/file_load_64.c | 211 --- arch/powerpc/kexec/ranges.c | 85 ++ arch/x86/include/asm/kexec.h| 5 +- arch/x86/kernel/crash.c | 21 +- drivers/base/cpu.c | 2 +- drivers/base/memory.c | 2 +- include/linux/kexec.h | 27 +- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 25 +- kernel/kexec.c | 4 +- kernel/kexec_file.c | 5 + 17 files changed, 549 insertions(+), 209 deletions(-) -- 2.41.0
[PATCH v15 5/5] powerpc: add crash memory hotplug support
Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot removal, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the kexec tool. Passing this flag to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h| 5 +- arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/core_64.c| 107 +++- arch/powerpc/kexec/file_load_64.c | 34 +++- arch/powerpc/kexec/ranges.c | 85 +++ 5 files changed, 225 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 943e58eb9bff..25ff5b7f1a28 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -116,8 +116,11 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges); #ifdef CONFIG_CRASH_HOTPLUG void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#endif /*CONFIG_CRASH_HOTPLUG */ +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size + +#endif /*CONFIG_CRASH_HOTPLUG */ #endif /* CONFIG_PPC64 */ #ifdef CONFIG_KEXEC_FILE diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index f83866a19e87..802abf580cf0 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int add_tce_mem_ranges(struct crash_mem **mem_ranges); int add_initrd_mem_range(struct crash_mem **mem_ranges); #ifdef CONFIG_PPC_64S_HASH_MMU diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 43fcd78c2102..4673f150f973 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -19,8 +19,11 @@ #include #include #include +#include #include +#include +#include #include #include #include @@ -546,6 +549,101 @@ int update_cpus_node(void *fdt) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt +/* + * Advertise preferred elfcorehdr size to userspace via + * /sys/kernel/crash_elfcorehdr_size sysfs interface. + */ +unsigned int arch_crash_get_elfcorehdr_size(void) +{ + unsigned int sz; + unsigned long elf_phdr_cnt; + + /* Program header for CPU notes and vmcoreinfo */ + elf_phdr_cnt =
[PATCH v15 3/5] powerpc/kexec: turn some static helper functions public
Move the functions update_cpus_node and get_crash_memory_ranges from kexec/file_load_64.c to kexec/core_64.c to make these functions usable by other kexec components. get_crash_memory_ranges uses functions defined in ranges.c, so take ranges.c out of CONFIG_KEXEC_FILE. Later in the series, these functions are utilized for in-kernel updates to kdump image during CPU/Memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. There is no intended functional change. Signed-off-by: Sourabh Jain Reviewed-by: Laurent Dufour Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h | 6 ++ arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/core_64.c | 166 ++ arch/powerpc/kexec/file_load_64.c | 162 - 4 files changed, 174 insertions(+), 164 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index e1b43aa12175..562e1bb689da 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -108,6 +108,12 @@ void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); #endif /* CONFIG_PPC_RTAS */ #endif /* CONFIG_CRASH_DUMP */ +#ifdef CONFIG_PPC64 +struct crash_mem; +int update_cpus_node(void *fdt); +int get_crash_memory_ranges(struct crash_mem **mem_ranges); +#endif /* CONFIG_PPC64 */ + #ifdef CONFIG_KEXEC_FILE extern const struct kexec_file_ops kexec_elf64_ops; diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile index 0c2abe7f9908..f2ed5b85b912 100644 --- a/arch/powerpc/kexec/Makefile +++ b/arch/powerpc/kexec/Makefile @@ -3,11 +3,11 @@ # Makefile for the linux kernel. # -obj-y += core.o crash.o core_$(BITS).o +obj-y += core.o crash.o ranges.o core_$(BITS).o obj-$(CONFIG_PPC32)+= relocate_32.o -obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o +obj-$(CONFIG_KEXEC_FILE) += file_load.o file_load_$(BITS).o elf_$(BITS).o # Disable GCOV, KCOV & sanitizers in odd or sensitive code GCOV_PROFILE_core_$(BITS).o := n diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 762e4d09aacf..48beaadcfb70 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include #include @@ -30,6 +32,8 @@ #include #include #include +#include +#include int machine_kexec_prepare(struct kimage *image) { @@ -376,6 +380,168 @@ void default_machine_kexec(struct kimage *image) /* NOTREACHED */ } +/** + * get_crash_memory_ranges - Get crash memory ranges. This list includes + * first/crashing kernel's memory regions that + * would be exported via an elfcore. + * @mem_ranges: Range list to add the memory ranges to. + * + * Returns 0 on success, negative errno on error. + */ +int get_crash_memory_ranges(struct crash_mem **mem_ranges) +{ + phys_addr_t base, end; + struct crash_mem *tmem; + u64 i; + int ret; + + for_each_mem_range(i, , ) { + u64 size = end - base; + + /* Skip backup memory region, which needs a separate entry */ + if (base == BACKUP_SRC_START) { + if (size > BACKUP_SRC_SIZE) { + base = BACKUP_SRC_END + 1; + size -= BACKUP_SRC_SIZE; + } else + continue; + } + + ret = add_mem_range(mem_ranges, base, size); + if (ret) + goto out; + + /* Try merging adjacent ranges before reallocation attempt */ + if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges) + sort_memory_ranges(*mem_ranges, true); + } + + /* Reallocate memory ranges if there is no space to split ranges */ + tmem = *mem_ranges; + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { + tmem = realloc_mem_ranges(mem_ranges); + if (!tmem) + goto out; + } + + /* Exclude crashkernel region */ + ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + /* +* FIXME: For now, stay in parity with k
[PATCH v15 4/5] powerpc: add crash CPU hotplug support
Due to CPU/Memory hotplug or online/offline events, the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward, CPU hotplug or online/offline events are referred as CPU/Memory add/remove events. The current solution to address the above issue involves monitoring the CPU/Memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. To address the aforementioned issue, commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. In the event of a CPU or memory add/remove events, the generic crash hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It then acquires the necessary locks to update the kdump image and invokes the architecture-specific crash hotplug handler, `arch_crash_handle_hotplug_event()`, to update the required kdump image components. This patch adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU add/remove events. Support for memory add/remove events is added in a subsequent patch with the title "powerpc: add crash memory hotplug support" As mentioned earlier, only the elfcorehdr and FDT kdump image components need to be updated in the event of CPU or memory add/remove events. However, on PowerPC architecture crash hotplug handler only updates the FDT to enable crash hotplug support for CPU add/remove events. Here's why. The elfcorehdr on PowerPC is built with possible CPUs, and thus, it does not need an update on CPU add/remove events. On the other hand, the FDT needs to be updated on CPU add events to include the newly added CPU. If the FDT is not updated and the kernel crashes on a newly added CPU, the kdump kernel will fail to boot due to the unavailability of the crashing CPU in the FDT. During the early boot, it is expected that the boot CPU must be a part of the FDT; otherwise, the kernel will raise a BUG and fail to boot. For more information, refer to commit 36ae37e3436b0 ("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is okay to have an offline CPU in the kdump FDT, no action is taken in case of CPU removal. There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. Few changes have been made to ensure kernel can safely update the FDT of kdump image loaded using both system calls. For kexec_file_load syscall the kdump image is prepared in kernel. So to support an increasing number of CPUs, the FDT is constructed with extra buffer space to ensure it can accommodate a possible number of CPU nodes. Additionally, a call to fdt_pack (which trims the unused space once the FDT is prepared) is avoided if this feature is enabled. For the kexec_load syscall, the FDT is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by userspace (kexec tools). When userspace passes this flag to the kernel, it indicates that the FDT is built to accommodate possible CPUs, and the FDT segment is excluded from SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/Kconfig | 4 ++ arch/powerpc/include/asm/kexec.h | 6 +++ arch/powerpc/kexec/core_64.c | 69 +++ arch/powerpc/kexec/elf_64.c | 12 +- arch/powerpc/kexec/file_load_64.c | 15 +++ 5 files changed, 105 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 414b978b8010..91d7bb0b81ee 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -682,6 +682,10 @@ config RELOCATABLE_TEST config ARCH_SUPPORTS_CRASH_DUMP def_bool PPC64 || PPC_BOOK3S_32 || PPC_8
[PATCH v15 2/5] crash: add a new kexec flag for hotplug support
Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to the kernel, it indicates to the kernel that all the required kexec segment is skipped from SHA calculation and it is safe to update kdump image loaded using the kexec_load syscall. While loading the kdump image using the kexec_load syscall, the @update_elfcorehdr member of struct kimage is set if the kexec tool sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used to determine whether it is safe to update elfcorehdr on hotplug events. However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag, the kexec tool could mark all the required kexec segments on an architecture as safe to update. So rename the @update_elfcorehdr to @hotplug_support. If @hotplug_support is set, the kernel can safely update all the required kexec segments of the kdump image during CPU/Memory hotplug events. Introduce an architecture-specific function to process kexec flags for determining hotplug support. Set the @hotplug_support member of struct kimage for both kexec_load and kexec_file_load system calls. This simplifies kernel checks to identify hotplug support for the currently loaded kdump image by just examining the value of @hotplug_support. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 3 +++ arch/x86/kernel/crash.c | 18 +++--- drivers/base/cpu.c | 2 +- drivers/base/memory.c| 2 +- include/linux/kexec.h| 25 +++-- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 11 --- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 9 files changed, 39 insertions(+), 32 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 9bb6607e864e..e791129fdf6c 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -211,6 +211,9 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support + #ifdef CONFIG_HOTPLUG_CPU int arch_crash_hotplug_cpu_support(void); #define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 44744e9c68ec..293b54bff706 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -398,20 +398,16 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) -{ - return crash_check_update_elfcorehdr(); -} +#ifdef CONFIG_KEXEC_FILE + if (image->file_mode) + return 1; #endif + return (kexec_flags & KEXEC_UPDATE_ELFCOREHDR || + kexec_flags & KEXEC_CRASH_HOTPLUG_SUPPORT); +} unsigned int arch_crash_get_elfcorehdr_size(void) { diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 548491de818e..2f411ddfbd8b 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -306,7 +306,7 @@ static ssize_t crash_hotplug_show(struct device *dev, struct device_attribute *attr, char *buf) { - return sysfs_emit(buf, "%d\n", cra
Re: [PATCH v14 3/6] crash: add a new kexec flag for FDT update
Hello Baoquan, While replying to this email earlier, I mistakenly pressed "Reply to List" instead of "Reply to All." Consequently, my response was sent only to powerpc mailing list. On 17/12/23 06:29, Baoquan He wrote: On 12/17/23 at 12:27am, Sourabh Jain wrote: On 16/12/23 15:11, Baoquan He wrote: On 12/15/23 at 12:17pm, Sourabh Jain wrote: .. diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0f6ea35879ee..bcedb7625b1f 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -319,6 +319,7 @@ struct kimage { #ifdef CONFIG_CRASH_HOTPLUG /* If set, allow changes to elfcorehdr of kexec_load'd image */ unsigned int update_elfcorehdr:1; + unsigned int update_fdt:1; Can we unify this to one flag, e.g hotplug_update? With this, on x86_64, we will skip the sha calculation for elfcorehdr. On ppc, we will skip the sha calculation for elfcorehdr and fdt. Yeah, that's what I suggested to Eric. I can do that, but I see one problem with powerpc or other platforms that need to skip SHA for more kexec segments in addition to elfcorehdr. `update_elfcorehdr` is set when the kexec tool sends the `KEXEC_UPDATE_ELFCOREHDR` flag to the kernel for the `kexec_load` system call. Given that the kexec tool has already been updated to send the `KEXEC_UPDATE_ELFCOREHDR` flag only when elfcorehdr is skipped from SHA verification in generic code, now it would be tricky for architectures to determine whether kexec has skipped SHA verification for just elfcorehdr or all segments needed on the platform with the same flag. In kexec-tools, it's judged by do_hotplug to skip the elfcorehdr segment. I am wondering how you skip the fdt segment when calculating and verifying sha, only saw the update_fdt mark. In the kexec tool where we loop through all the kexec segments to calculate the SHA, there will be a arch call made to determine whether the segment needs to be excluded from SHA or not. OK, a arch call will be added to exclude segments in the ARCH. And the elfcorehdr segment need be excluded in x86 ARCH in case other ARCH later may not want to exclude elfcorehdr. Yes, Arch can choose which segment to exclude. Now in the arch function if decide a specific segment needs to excluded then corresponding flag is also set by arch function to communicate same with the kernel. But I don't see how you exclude elfcorehdr and fdt in kernel for kexec_file codes. It's not happening in kexec-tools. On PowerPC, SHA verification is NOT performed for the kexec_file_load case; hence, you won't find any code changes in my patch series to exclude FDT in the kernel code. However, let's consider a scenario where it gets added in the future, or other architectures need to skip the kexec segment, in addition to elfcorehdr. In that case, we can use the same setup as you suggested below. For each kexec segment, there should be an architecture-specific function call to decide whether the segment needs to be excluded or not. About the existing KEXEC_UPDATE_ELFCOREHDR, we only rename the macro, but still use the same value, could you think of what problem could be caused between kernel and kexec-tools utility, the old and new version compatibility? Just changing the macro name will NOT help because the current kexec tool enables the KEXEC_UPDATE_ELFCOREHDR = 0x0004 kexec flag bit if the command argument --hotplug is passed to the kexec and the /sys/kernel/crash_elfcorehdr_size file exists in the system. As we have discussed, excluding will be done in each ARCH's function when doing sha calculation in kexec-tools, isn't it? diff --git a/kexec/kexec.c b/kexec/kexec.c index b5393e3b20aa..0095aeec988a 100644 --- a/kexec/kexec.c +++ b/kexec/kexec.c @@ -701,10 +701,10 @@ static void update_purgatory(struct kexec_info *info) continue; } - /* Don't include elfcorehdr in the checksum, if hotplug + /* Don't include unwanted segments in the checksum, if hotplug * support enabled. -*/ - if (do_hotplug && (info->segment[i].mem == (void *)info->elfcorehdr)) { + if (do_hotplug) + arch_exclude_segments(info, ) continue; } Yes, something like the above should work. Now, let's say an architecture enables this feature in the kernel with the assumption that the 0x0004 kexec flag bit is passed from the kexec tool when all the required kexec segments are skipped from SHA calculation. In this case, the current kexec tool, which passes the 0x0004 kexec flag bit only when the elfcorehdr is skipped, will cause issues for architectures. If it's about the new header files installed on older kernel, we can change it like below? Fortunately only one release, 6.6 passed. diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 3d5b3d757bed..df6a6505e267 10064
Re: [RFC PATCH 1/3] powerpc/pseries/fadump: add support for multiple boot memory regions
Hello Aditya, On 17/12/23 14:11, Aditya Gupta wrote: Hi sourabh, On 06/12/23 01:48, Hari Bathini wrote: From: Sourabh Jain Currently, fadump on pseries assumes a single boot memory region even though f/w supports more than one boot memory region. Add support for more boot memory regions to make the implementation flexible for any enhancements that introduce other region types. For this, rtas memory structure for fadump is updated to have multiple boot memory regions instead of just one. Additionally, methods responsible for creating the fadump memory structure during both the first and second kernel boot have been modified to take these multiple boot memory regions into account. Also, a new callback has been added to the fadump_ops structure to get the maximum boot memory regions supported by the platform. Signed-off-by: Sourabh Jain Signed-off-by: Hari Bathini --- arch/powerpc/include/asm/fadump-internal.h | 2 +- arch/powerpc/kernel/fadump.c | 27 +- arch/powerpc/platforms/powernv/opal-fadump.c | 8 + arch/powerpc/platforms/pseries/rtas-fadump.c | 258 --- arch/powerpc/platforms/pseries/rtas-fadump.h | 26 +- 5 files changed, 199 insertions(+), 122 deletions(-) diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index 27f9e11eda28..b3956c400519 100644 --- a/arch/powerpc/include/asm/fadump-internal.h +++ b/arch/powerpc/include/asm/fadump-internal.h @@ -129,6 +129,7 @@ struct fadump_ops { struct seq_file *m); void (*fadump_trigger)(struct fadump_crash_info_header *fdh, const char *msg); + int (*fadump_max_boot_mem_rgns)(void); }; /* Helper functions */ @@ -136,7 +137,6 @@ s32 __init fadump_setup_cpu_notes_buf(u32 num_cpus); void fadump_free_cpu_notes_buf(void); u32 *__init fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs); void __init fadump_update_elfcore_header(char *bufp); -bool is_fadump_boot_mem_contiguous(void); bool is_fadump_reserved_mem_contiguous(void); #else /* !CONFIG_PRESERVE_FA_DUMP */ diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index d14eda1e8589..757681658dda 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -222,28 +222,6 @@ static bool is_fadump_mem_area_contiguous(u64 d_start, u64 d_end) return ret; } -/* - * Returns true, if there are no holes in boot memory area, - * false otherwise. - */ -bool is_fadump_boot_mem_contiguous(void) -{ - unsigned long d_start, d_end; - bool ret = false; - int i; - - for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) { - d_start = fw_dump.boot_mem_addr[i]; - d_end = d_start + fw_dump.boot_mem_sz[i]; - - ret = is_fadump_mem_area_contiguous(d_start, d_end); - if (!ret) - break; - } - - return ret; -} - /* * Returns true, if there are no holes in reserved memory area, * false otherwise. @@ -389,10 +367,11 @@ static unsigned long __init get_fadump_area_size(void) static int __init add_boot_mem_region(unsigned long rstart, unsigned long rsize) { + int max_boot_mem_rgns = fw_dump.ops->fadump_max_boot_mem_rgns(); int i = fw_dump.boot_mem_regs_cnt++; - if (fw_dump.boot_mem_regs_cnt > FADUMP_MAX_MEM_REGS) { - fw_dump.boot_mem_regs_cnt = FADUMP_MAX_MEM_REGS; + if (fw_dump.boot_mem_regs_cnt > max_boot_mem_rgns) { + fw_dump.boot_mem_regs_cnt = max_boot_mem_rgns; return 0; } diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c index 964f464b1b0e..fa26c21a08d9 100644 --- a/arch/powerpc/platforms/powernv/opal-fadump.c +++ b/arch/powerpc/platforms/powernv/opal-fadump.c @@ -615,6 +615,13 @@ static void opal_fadump_trigger(struct fadump_crash_info_header *fdh, pr_emerg("No backend support for MPIPL!\n"); } +/* FADUMP_MAX_MEM_REGS or lower */ +static int opal_fadump_max_boot_mem_rgns(void) +{ + return FADUMP_MAX_MEM_REGS; + +} + static struct fadump_ops opal_fadump_ops = { .fadump_init_mem_struct = opal_fadump_init_mem_struct, .fadump_get_metadata_size = opal_fadump_get_metadata_size, @@ -627,6 +634,7 @@ static struct fadump_ops opal_fadump_ops = { .fadump_process = opal_fadump_process, .fadump_region_show = opal_fadump_region_show, .fadump_trigger = opal_fadump_trigger, + .fadump_max_boot_mem_rgns = opal_fadump_max_boot_mem_rgns, }; void __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, u64 node) diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadump.c index b5853e9fcc3c..1b05b4cefdfd 100644 --- a/arch/powerpc/platforms/pseries/rtas-fadump.c +++ b/arch/powerpc/platforms/pseries/rtas-fadump.c @@ -29,9 +29,6 @@ static const struct rtas_fadump_m
Re: [PATCH v14 3/6] crash: add a new kexec flag for FDT update
On 17/12/23 06:29, Baoquan He wrote: On 12/17/23 at 12:27am, Sourabh Jain wrote: On 16/12/23 15:11, Baoquan He wrote: On 12/15/23 at 12:17pm, Sourabh Jain wrote: .. diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0f6ea35879ee..bcedb7625b1f 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -319,6 +319,7 @@ struct kimage { #ifdef CONFIG_CRASH_HOTPLUG /* If set, allow changes to elfcorehdr of kexec_load'd image */ unsigned int update_elfcorehdr:1; + unsigned int update_fdt:1; Can we unify this to one flag, e.g hotplug_update? With this, on x86_64, we will skip the sha calculation for elfcorehdr. On ppc, we will skip the sha calculation for elfcorehdr and fdt. Yeah, that's what I suggested to Eric. I can do that, but I see one problem with powerpc or other platforms that need to skip SHA for more kexec segments in addition to elfcorehdr. `update_elfcorehdr` is set when the kexec tool sends the `KEXEC_UPDATE_ELFCOREHDR` flag to the kernel for the `kexec_load` system call. Given that the kexec tool has already been updated to send the `KEXEC_UPDATE_ELFCOREHDR` flag only when elfcorehdr is skipped from SHA verification in generic code, now it would be tricky for architectures to determine whether kexec has skipped SHA verification for just elfcorehdr or all segments needed on the platform with the same flag. In kexec-tools, it's judged by do_hotplug to skip the elfcorehdr segment. I am wondering how you skip the fdt segment when calculating and verifying sha, only saw the update_fdt mark. In the kexec tool where we loop through all the kexec segments to calculate the SHA, there will be a arch call made to determine whether the segment needs to be excluded from SHA or not. OK, a arch call will be added to exclude segments in the ARCH. And the elfcorehdr segment need be excluded in x86 ARCH in case other ARCH later may not want to exclude elfcorehdr. Yes, Arch can choose which segment to exclude. Now in the arch function if decide a specific segment needs to excluded then corresponding flag is also set by arch function to communicate same with the kernel. But I don't see how you exclude elfcorehdr and fdt in kernel for kexec_file codes. It's not happening in kexec-tools. On PowerPC, SHA verification is NOT performed for the kexec_file_load case; hence, you won't find any code changes in my patch series to exclude FDT in the kernel code. However, let's consider a scenario where it gets added in the future, or other architectures need to skip the kexec segment, in addition to elfcorehdr. In that case, we can use the same setup as you suggested below. For each kexec segment, there should be an architecture-specific function call to decide whether the segment needs to be excluded or not. About the existing KEXEC_UPDATE_ELFCOREHDR, we only rename the macro, but still use the same value, could you think of what problem could be caused between kernel and kexec-tools utility, the old and new version compatibility? Just changing the macro name will NOT help because the current kexec tool enables the KEXEC_UPDATE_ELFCOREHDR = 0x0004 kexec flag bit if the command argument --hotplug is passed to the kexec and the /sys/kernel/crash_elfcorehdr_size file exists in the system. As we have discussed, excluding will be done in each ARCH's function when doing sha calculation in kexec-tools, isn't it? diff --git a/kexec/kexec.c b/kexec/kexec.c index b5393e3b20aa..0095aeec988a 100644 --- a/kexec/kexec.c +++ b/kexec/kexec.c @@ -701,10 +701,10 @@ static void update_purgatory(struct kexec_info *info) continue; } - /* Don't include elfcorehdr in the checksum, if hotplug + /* Don't include unwanted segments in the checksum, if hotplug * support enabled. -*/ - if (do_hotplug && (info->segment[i].mem == (void *)info->elfcorehdr)) { + if (do_hotplug) + arch_exclude_segments(info, ) continue; } Yes, something like the above should work. Now, let's say an architecture enables this feature in the kernel with the assumption that the 0x0004 kexec flag bit is passed from the kexec tool when all the required kexec segments are skipped from SHA calculation. In this case, the current kexec tool, which passes the 0x0004 kexec flag bit only when the elfcorehdr is skipped, will cause issues for architectures. If it's about the new header files installed on older kernel, we can change it like below? Fortunately only one release, 6.6 passed. diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 3d5b3d757bed..df6a6505e267 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -13,7 +13,7 @@ #define KEXEC_ON_CRASH 0x0001 #define KEXEC_PRESERVE_CONTEXT 0x0002 -#define KE
Re: [PATCH v14 3/6] crash: add a new kexec flag for FDT update
On 16/12/23 15:11, Baoquan He wrote: On 12/15/23 at 12:17pm, Sourabh Jain wrote: .. diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0f6ea35879ee..bcedb7625b1f 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -319,6 +319,7 @@ struct kimage { #ifdef CONFIG_CRASH_HOTPLUG /* If set, allow changes to elfcorehdr of kexec_load'd image */ unsigned int update_elfcorehdr:1; + unsigned int update_fdt:1; Can we unify this to one flag, e.g hotplug_update? With this, on x86_64, we will skip the sha calculation for elfcorehdr. On ppc, we will skip the sha calculation for elfcorehdr and fdt. Yeah, that's what I suggested to Eric. I can do that, but I see one problem with powerpc or other platforms that need to skip SHA for more kexec segments in addition to elfcorehdr. `update_elfcorehdr` is set when the kexec tool sends the `KEXEC_UPDATE_ELFCOREHDR` flag to the kernel for the `kexec_load` system call. Given that the kexec tool has already been updated to send the `KEXEC_UPDATE_ELFCOREHDR` flag only when elfcorehdr is skipped from SHA verification in generic code, now it would be tricky for architectures to determine whether kexec has skipped SHA verification for just elfcorehdr or all segments needed on the platform with the same flag. In kexec-tools, it's judged by do_hotplug to skip the elfcorehdr segment. I am wondering how you skip the fdt segment when calculating and verifying sha, only saw the update_fdt mark. In the kexec tool where we loop through all the kexec segments to calculate the SHA, there will be a arch call made to determine whether the segment needs to be excluded from SHA or not. Now in the arch function if decide a specific segment needs to excluded then corresponding flag is also set by arch function to communicate same with the kernel. About the existing KEXEC_UPDATE_ELFCOREHDR, we only rename the macro, but still use the same value, could you think of what problem could be caused between kernel and kexec-tools utility, the old and new version compatibility? Just changing the macro name will NOT help because the current kexec tool enables the KEXEC_UPDATE_ELFCOREHDR = 0x0004 kexec flag bit if the command argument --hotplug is passed to the kexec and the /sys/kernel/crash_elfcorehdr_size file exists in the system. Now, let's say an architecture enables this feature in the kernel with the assumption that the 0x0004 kexec flag bit is passed from the kexec tool when all the required kexec segments are skipped from SHA calculation. In this case, the current kexec tool, which passes the 0x0004 kexec flag bit only when the elfcorehdr is skipped, will cause issues for architectures. If it's about the new header files installed on older kernel, we can change it like below? Fortunately only one release, 6.6 passed. diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 3d5b3d757bed..df6a6505e267 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -13,7 +13,7 @@ #define KEXEC_ON_CRASH 0x0001 #define KEXEC_PRESERVE_CONTEXT 0x0002 -#define KEXEC_UPDATE_FDT 0x0008 +#define KEXEC_CRASH_HOTPLUG_UPDATE 0x0004 #define KEXEC_UPDATE_ELFCOREHDR0x0004 #define KEXEC_ARCH_MASK0x /* With my understanding, the kexec flag should be indicating the action, the mem/cpu hotplug, but not relating to any detail. Imagine later another segment need be skipped on one ARCH again, then another flag need be added, this sounds not reasonable. I strongly agree with you. The KEXEC_CRASH_HOTPLUG_UPDATE kexec flag should be sufficient to inform the kernel that the kexec tool has been updated to support CPU/Memory hotplug for the kexec_load system call. Unfortunately, we cannot use the 0x0004 kexec flags bit for KEXEC_CRASH_HOTPLUG_UPDATE at the moment. What about using 0x0008 for the KEXEC_CRASH_HOTPLUG_UPDATE flag? I am aware that we are utilizing two kexec flag bits (0x0004 and 0x0008) for the same feature, but what other options do we have? Thanks, Sourabh Code snippet from the kexec tool: main() { ... /* NOTE: Xen KEXEC_LIVE_UPDATE and KEXEC_UPDATE_ELFCOREHDR collide */ if (do_hotplug) { ... /* Indicate to the kernel it is ok to modify the elfcorehdr */ kexec_flags |= KEXEC_UPDATE_ELFCOREHDR; } ... } Any suggestion how to handle this with just one kexec flag? Thanks for the review. Thanks, Sourabh Jain #endif #ifdef ARCH_HAS_KIMAGE_ARCH @@ -396,9 +397,10 @@ bool kexec_load_permitted(int kexec_image_type); /* List of defined/legal kexec flags */ #ifndef CONFIG_KEXEC_JUMP -#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR) +#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | KEXEC_UPDATE_FDT) #else -#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT
Re: [RFC PATCH 2/3] powerpc/fadump: pass additional parameters to dump capture kernel
Hello Hari, On 06/12/23 01:48, Hari Bathini wrote: For fadump case, passing additional parameters to dump capture kernel helps in minimizing the memory footprint for it and also provides the flexibility to disable components/modules, like hugepages, that are hindering the boot process of the special dump capture environment. Set up a dedicated parameter area to be passed to the capture kernel. This area type is defined as RTAS_FADUMP_PARAM_AREA. Sysfs attribute '/sys/kernel/fadump/bootargs_append' is exported to the userspace to specify the additional parameters to be passed to the capture kernel Signed-off-by: Hari Bathini --- arch/powerpc/include/asm/fadump-internal.h | 3 + arch/powerpc/kernel/fadump.c | 80 arch/powerpc/platforms/powernv/opal-fadump.c | 6 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 35 - arch/powerpc/platforms/pseries/rtas-fadump.h | 11 ++- 5 files changed, 126 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index b3956c400519..81629226b15f 100644 --- a/arch/powerpc/include/asm/fadump-internal.h +++ b/arch/powerpc/include/asm/fadump-internal.h @@ -97,6 +97,8 @@ struct fw_dump { unsigned long cpu_notes_buf_vaddr; unsigned long cpu_notes_buf_size; + unsigned long param_area; + /* * Maximum size supported by firmware to copy from source to * destination address per entry. @@ -111,6 +113,7 @@ struct fw_dump { unsigned long dump_active:1; unsigned long dump_registered:1; unsigned long nocma:1; + unsigned long param_area_supported:1; struct fadump_ops *ops; }; diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 757681658dda..98f089747ac9 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1470,6 +1470,7 @@ static ssize_t mem_reserved_show(struct kobject *kobj, return sprintf(buf, "%ld\n", fw_dump.reserve_dump_area_size); } + static ssize_t registered_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -1477,6 +1478,43 @@ static ssize_t registered_show(struct kobject *kobj, return sprintf(buf, "%d\n", fw_dump.dump_registered); } +static ssize_t bootargs_append_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%s\n", (char *)__va(fw_dump.param_area)); +} + +static ssize_t bootargs_append_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + char *params; + + if (!fw_dump.fadump_enabled || fw_dump.dump_active) + return -EPERM; + + if (count >= COMMAND_LINE_SIZE) + return -EINVAL; + + /* +* Fail here instead of handling this scenario with +* some silly workaround in capture kernel. +*/ + if (saved_command_line_len + count >= COMMAND_LINE_SIZE) { + pr_err("Appending parameters exceeds cmdline size!\n"); + return -ENOSPC; + } + + params = __va(fw_dump.param_area); + strscpy_pad(params, buf, COMMAND_LINE_SIZE); + /* Remove newline character at the end. */ + if (params[count-1] == '\n') + params[count-1] = '\0'; + + return count; +} + static ssize_t registered_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) @@ -1535,6 +1573,7 @@ static struct kobj_attribute release_attr = __ATTR_WO(release_mem); static struct kobj_attribute enable_attr = __ATTR_RO(enabled); static struct kobj_attribute register_attr = __ATTR_RW(registered); static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved); +static struct kobj_attribute bootargs_append_attr = __ATTR_RW(bootargs_append); static struct attribute *fadump_attrs[] = { _attr.attr, @@ -1611,6 +1650,46 @@ static void __init fadump_init_files(void) return; } +/* + * Reserve memory to store additional parameters to be passed + * for fadump/capture kernel. + */ +static void fadump_setup_param_area(void) +{ + phys_addr_t range_start, range_end; + + if (!fw_dump.param_area_supported || fw_dump.dump_active) + return; + + /* This memory can't be used by PFW or bootloader as it is shared across kernels */ + if (radix_enabled()) { + /* +* Anywhere in the upper half should be good enough as all memory +* is accessible in real mode. +*/ + range_start = memblock_end_of_DRAM() / 2; + range_end =
Re: [RFC PATCH 1/3] powerpc/pseries/fadump: add support for multiple boot memory regions
Hello Hari, On 06/12/23 01:48, Hari Bathini wrote: From: Sourabh Jain Currently, fadump on pseries assumes a single boot memory region even though f/w supports more than one boot memory region. Add support for more boot memory regions to make the implementation flexible for any enhancements that introduce other region types. For this, rtas memory structure for fadump is updated to have multiple boot memory regions instead of just one. Additionally, methods responsible for creating the fadump memory structure during both the first and second kernel boot have been modified to take these multiple boot memory regions into account. Also, a new callback has been added to the fadump_ops structure to get the maximum boot memory regions supported by the platform. Signed-off-by: Sourabh Jain Signed-off-by: Hari Bathini --- arch/powerpc/include/asm/fadump-internal.h | 2 +- arch/powerpc/kernel/fadump.c | 27 +- arch/powerpc/platforms/powernv/opal-fadump.c | 8 + arch/powerpc/platforms/pseries/rtas-fadump.c | 258 --- arch/powerpc/platforms/pseries/rtas-fadump.h | 26 +- 5 files changed, 199 insertions(+), 122 deletions(-) diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index 27f9e11eda28..b3956c400519 100644 --- a/arch/powerpc/include/asm/fadump-internal.h +++ b/arch/powerpc/include/asm/fadump-internal.h @@ -129,6 +129,7 @@ struct fadump_ops { struct seq_file *m); void(*fadump_trigger)(struct fadump_crash_info_header *fdh, const char *msg); + int (*fadump_max_boot_mem_rgns)(void); }; /* Helper functions */ @@ -136,7 +137,6 @@ s32 __init fadump_setup_cpu_notes_buf(u32 num_cpus); void fadump_free_cpu_notes_buf(void); u32 *__init fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs); void __init fadump_update_elfcore_header(char *bufp); -bool is_fadump_boot_mem_contiguous(void); bool is_fadump_reserved_mem_contiguous(void); #else /* !CONFIG_PRESERVE_FA_DUMP */ diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index d14eda1e8589..757681658dda 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -222,28 +222,6 @@ static bool is_fadump_mem_area_contiguous(u64 d_start, u64 d_end) return ret; } -/* - * Returns true, if there are no holes in boot memory area, - * false otherwise. - */ -bool is_fadump_boot_mem_contiguous(void) -{ - unsigned long d_start, d_end; - bool ret = false; - int i; - - for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) { - d_start = fw_dump.boot_mem_addr[i]; - d_end = d_start + fw_dump.boot_mem_sz[i]; - - ret = is_fadump_mem_area_contiguous(d_start, d_end); - if (!ret) - break; - } - - return ret; -} - /* * Returns true, if there are no holes in reserved memory area, * false otherwise. @@ -389,10 +367,11 @@ static unsigned long __init get_fadump_area_size(void) static int __init add_boot_mem_region(unsigned long rstart, unsigned long rsize) { + int max_boot_mem_rgns = fw_dump.ops->fadump_max_boot_mem_rgns(); int i = fw_dump.boot_mem_regs_cnt++; - if (fw_dump.boot_mem_regs_cnt > FADUMP_MAX_MEM_REGS) { - fw_dump.boot_mem_regs_cnt = FADUMP_MAX_MEM_REGS; + if (fw_dump.boot_mem_regs_cnt > max_boot_mem_rgns) { + fw_dump.boot_mem_regs_cnt = max_boot_mem_rgns; return 0; } diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c index 964f464b1b0e..fa26c21a08d9 100644 --- a/arch/powerpc/platforms/powernv/opal-fadump.c +++ b/arch/powerpc/platforms/powernv/opal-fadump.c @@ -615,6 +615,13 @@ static void opal_fadump_trigger(struct fadump_crash_info_header *fdh, pr_emerg("No backend support for MPIPL!\n"); } +/* FADUMP_MAX_MEM_REGS or lower */ +static int opal_fadump_max_boot_mem_rgns(void) +{ + return FADUMP_MAX_MEM_REGS; + Nitpick: we can get rid of the above blank line. - Sourabh +} + static struct fadump_ops opal_fadump_ops = { .fadump_init_mem_struct = opal_fadump_init_mem_struct, .fadump_get_metadata_size = opal_fadump_get_metadata_size, @@ -627,6 +634,7 @@ static struct fadump_ops opal_fadump_ops = { .fadump_process = opal_fadump_process, .fadump_region_show = opal_fadump_region_show, .fadump_trigger = opal_fadump_trigger, + .fadump_max_boot_mem_rgns = opal_fadump_max_boot_mem_rgns, }; void __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, u64 node) diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadum
Re: [PATCH v14 3/6] crash: add a new kexec flag for FDT update
Hello Baoquan, On 15/12/23 07:58, Baoquan He wrote: On 12/11/23 at 02:00pm, Sourabh Jain wrote: The commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate kernel that it is safe to modify the elfcorehdr of kdump image loaded using kexec_load system call. Similarly, add a new kexec flag, `KEXEC_UPDATE_FDT`, for another kdump component named FDT (Flatten Device Tree). Architectures like PowerPC need to update FDT kdump image component on CPU hotplug events. Kexec tool passing `KEXEC_UPDATE_FDT` will be an indication to kernel that FDT segment is not part of SHA calculation hence it is safe to update it. With the `KEXEC_UPDATE_ELFCOREHDR` and `KEXEC_UPDATE_FDT` kexec flags, crash hotplug support can be added to PowerPC for the kexec_load syscall while maintaining the backward compatibility with older kexec tools that do not have these newly introduced flags. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- include/linux/kexec.h | 6 -- include/uapi/linux/kexec.h | 1 + kernel/kexec.c | 2 ++ 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0f6ea35879ee..bcedb7625b1f 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -319,6 +319,7 @@ struct kimage { #ifdef CONFIG_CRASH_HOTPLUG /* If set, allow changes to elfcorehdr of kexec_load'd image */ unsigned int update_elfcorehdr:1; + unsigned int update_fdt:1; Can we unify this to one flag, e.g hotplug_update? With this, on x86_64, we will skip the sha calculation for elfcorehdr. On ppc, we will skip the sha calculation for elfcorehdr and fdt. Yeah, that's what I suggested to Eric. I can do that, but I see one problem with powerpc or other platforms that need to skip SHA for more kexec segments in addition to elfcorehdr. `update_elfcorehdr` is set when the kexec tool sends the `KEXEC_UPDATE_ELFCOREHDR` flag to the kernel for the `kexec_load` system call. Given that the kexec tool has already been updated to send the `KEXEC_UPDATE_ELFCOREHDR` flag only when elfcorehdr is skipped from SHA verification in generic code, now it would be tricky for architectures to determine whether kexec has skipped SHA verification for just elfcorehdr or all segments needed on the platform with the same flag. Code snippet from the kexec tool: main() { ... /* NOTE: Xen KEXEC_LIVE_UPDATE and KEXEC_UPDATE_ELFCOREHDR collide */ if (do_hotplug) { ... /* Indicate to the kernel it is ok to modify the elfcorehdr */ kexec_flags |= KEXEC_UPDATE_ELFCOREHDR; } ... } Any suggestion how to handle this with just one kexec flag? Thanks for the review. Thanks, Sourabh Jain #endif #ifdef ARCH_HAS_KIMAGE_ARCH @@ -396,9 +397,10 @@ bool kexec_load_permitted(int kexec_image_type); /* List of defined/legal kexec flags */ #ifndef CONFIG_KEXEC_JUMP -#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR) +#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | KEXEC_UPDATE_FDT) #else -#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR) +#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR | \ + KEXEC_UPDATE_FDT) #endif /* List of defined/legal kexec file flags */ diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 01766dd839b0..3d5b3d757bed 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -13,6 +13,7 @@ #define KEXEC_ON_CRASH0x0001 #define KEXEC_PRESERVE_CONTEXT0x0002 #define KEXEC_UPDATE_ELFCOREHDR 0x0004 +#define KEXEC_UPDATE_FDT 0x0008 #define KEXEC_ARCH_MASK 0x /* diff --git a/kernel/kexec.c b/kernel/kexec.c index 8f35a5a42af8..97eb151cd931 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -132,6 +132,8 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, #ifdef CONFIG_CRASH_HOTPLUG if (flags & KEXEC_UPDATE_ELFCOREHDR) image->update_elfcorehdr = 1; + if (flags & KEXEC_UPDATE_FDT) + image->update_fdt = 1; #endif ret = machine_kexec_prepare(image); -- 2.41.0
Re: [PATCH v14 6/6] powerpc: add crash memory hotplug support
On 15/12/23 06:53, Baoquan He wrote: On 12/11/23 at 02:00pm, Sourabh Jain wrote: .. diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index f83866a19e87..802abf580cf0 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int add_tce_mem_ranges(struct crash_mem **mem_ranges); int add_initrd_mem_range(struct crash_mem **mem_ranges); #ifdef CONFIG_PPC_64S_HASH_MMU diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 9932793cd64b..5be30659172f 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -19,8 +19,11 @@ #include #include #include +#include #include +#include +#include #include #include #include @@ -547,9 +550,7 @@ int update_cpus_node(void *fdt) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -#ifdef CONFIG_HOTPLUG_CPU - /* Provides the value for the sysfs crash_hotplug nodes */ -int arch_crash_hotplug_cpu_support(struct kimage *image) +static int crash_hotplug_support(struct kimage *image) { if (image->file_mode) return 1; @@ -560,8 +561,118 @@ int arch_crash_hotplug_cpu_support(struct kimage *image) */ return image->update_elfcorehdr && image->update_fdt; } + +#ifdef CONFIG_HOTPLUG_CPU + /* Provides the value for the sysfs crash_hotplug nodes */ +int arch_crash_hotplug_cpu_support(struct kimage *image) +{ + return crash_hotplug_support(image); +} +#endif + +#ifdef CONFIG_MEMORY_HOTPLUG + /* Provides the value for the sysfs memory_hotplug nodes */ +int arch_crash_hotplug_memory_support(struct kimage *image) +{ + return crash_hotplug_support(image); +} #endif +/* + * Advertise preferred elfcorehdr size to userspace via + * /sys/kernel/crash_elfcorehdr_size sysfs interface. + */ +unsigned int arch_crash_get_elfcorehdr_size(void) +{ + unsigned int sz; + unsigned long elf_phdr_cnt; + + /* Program header for CPU notes and vmcoreinfo */ + elf_phdr_cnt = 2; + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) + /* In the worst case, a Phdr is needed for every other LMB to be +* represented as an individual crash range. +*/ + elf_phdr_cnt += memory_hotplug_max() / (2 * drmem_lmb_size()); + + /* Do not cross the max limit */ + if (elf_phdr_cnt > PN_XNUM) + elf_phdr_cnt = PN_XNUM; + + sz = sizeof(struct elfhdr) + (elf_phdr_cnt * sizeof(Elf64_Phdr)); + return sz; +} + +/** + * update_crash_elfcorehdr() - Recreate the elfcorehdr and replace it with old + *elfcorehdr in the kexec segment array. + * @image: the active struct kimage + * @mn: struct memory_notify data handler + */ +static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *mn) +{ + int ret; + struct crash_mem *cmem = NULL; + struct kexec_segment *ksegment; + void *ptr, *mem, *elfbuf = NULL; + unsigned long elfsz, memsz, base_addr, size; + + ksegment = >segment[image->elfcorehdr_index]; + mem = (void *) ksegment->mem; + memsz = ksegment->memsz; + + ret = get_crash_memory_ranges(); + if (ret) { + pr_err("Failed to get crash mem range\n"); + return; + } + + /* +* The hot unplugged memory is part of crash memory ranges, +* remove it here. +*/ + if (image->hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY) { + base_addr = PFN_PHYS(mn->start_pfn); + size = mn->nr_pages * PAGE_SIZE; + ret = remove_mem_range(, base_addr, size); Althouth this is ppc specific, I don't understand. Why don't you recreate the elfcorehdr, but take removing the removed region. Comparing the remove_mem_range() implementation with recreating, I don't see too much benefit from that, and it makes your code more complicated. Just curious, surely ppc people can decide what should be taken. I am recreating `elfcorehdr` by calling `crash_prepare_elf64_headers()` below. This complexity is necessary to avoid adding hot-removed memory to the new `elfcorehdr`. On powerpc, the memblock list is utilized to prepare the `elfcorehdr`. In the case of memory hot removal, the memblock list is updated after the arch crash hotplug handler is triggered. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to `elfcorehdr` do not include the hot-removed memory. Thanks, Sourabh Jain +
Re: [PATCH v14 2/6] crash: make CPU and Memory hotplug support reporting flexible
Hello Baoquan, On 14/12/23 19:43, Baoquan He wrote: On 12/11/23 at 02:00pm, Sourabh Jain wrote: Architectures' specific functions `arch_crash_hotplug_cpu_support()` and `arch_crash_hotplug_memory_support()` advertise the kernel's capability to update the kdump image on CPU and Memory hotplug events to userspace via the sysfs interface. These architecture-specific functions need to access various attributes of the `kexec_crash_image` object to determine whether the kernel can update the kdump image and advertise this information to userspace accordingly. As the architecture-specific code is not exposed to the APIs required to acquire the lock for accessing the `kexec_crash_image` object, it calls a generic function, `crash_check_update_elfcorehdr()`, to determine whether the kernel can update the kdump image or not. The lack of access to the `kexec_crash_image` object in architecture-specific code restricts architectures from performing additional architecture-specific checks required to determine if the kdump image is updatable or not. For instance, on PowerPC, the kernel can update the kdump image only if both the elfcorehdr and FDT are marked as updatable for the `kexec_load` system call. So define two generic functions, `crash_hotplug_cpu_support()` and `crash_hotplug_memory_support()`, which are called when userspace attempts to read the crash CPU and Memory hotplug support via the sysfs interface. These functions take the necessary locks needed to access the `kexec_crash_image` object and then forward it to the architecture-specific handler to do the rest. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 8 arch/x86/kernel/crash.c | 20 +++- include/linux/kexec.h| 13 +++-- kernel/crash_core.c | 23 +-- 4 files changed, 43 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 9bb6607e864e..5c88d27b086d 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -212,13 +212,13 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support +int arch_crash_hotplug_cpu_support(struct kimage *image); +#define arch_crash_hotplug_cpu_support arch_crash_hotplug_cpu_support #endif #ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support +int arch_crash_hotplug_memory_support(struct kimage *image); +#define arch_crash_hotplug_memory_support arch_crash_hotplug_memory_support #endif unsigned int arch_crash_get_elfcorehdr_size(void); diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 0d7b2657beb6..ad5941665589 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -398,18 +398,28 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ +#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG) +static int crash_hotplug_support(struct kimage *image) +{ + if (image->file_mode) + return 1; + + return image->update_elfcorehdr; +} +#endif + #ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +/* These functions provide the value for the sysfs crash_hotplug nodes */ +int arch_crash_hotplug_cpu_support(struct kimage *image) { - return crash_check_update_elfcorehdr(); + return crash_hotplug_support(image); } #endif #ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) +int arch_crash_hotplug_memory_support(struct kimage *image) { - return crash_check_update_elfcorehdr(); + return crash_hotplug_support(image); } #endif diff --git a/include/linux/kexec.h b/include/linux/kexec.h index ee28c09a7fb0..0f6ea35879ee 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -486,16 +486,17 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } #endif -int crash_check_update_elfcorehdr(void); - -#ifndef crash_hotplug_cpu
[PATCH v14 6/6] powerpc: add crash memory hotplug support
Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot removal, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if both the KEXEC_UPDATE_FDT and KEXEC_UPDATE_ELFCOREHDR kexec flags are passed to the kernel by the kexec tool. Passing these flags to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update it. Commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes") added a sysfs interface to indicate userspace (kdump udev rule) that kernel will update the kdump image on memory hotplug events, so kdump reload can be avoided. Implement arch specific function `arch_crash_hotplug_memory_support()` to correctly advertise kernel capability to update kdump image. This feature is advertised to usersapce when following conditions are met: 1. Kdump image is loaded using kexec_file_load system call. 2. Kdump image is loaded using kexec_load system and both KEXEC_UPATE_ELFCOREHDR and KEXEC_UPDATE_FDT kexec flags are passed to kernel. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h| 8 ++ arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/core_64.c| 126 ++-- arch/powerpc/kexec/file_load_64.c | 34 ++- arch/powerpc/kexec/ranges.c | 85 5 files changed, 245 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 7823ab10d323..c8d6cfda523c 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -122,6 +122,14 @@ int arch_crash_hotplug_cpu_support(struct kimage *image); #define arch_crash_hotplug_cpu_support arch_crash_hotplug_cpu_support #endif +#ifdef CONFIG_MEMORY_HOTPLUG +int arch_crash_hotplug_memory_support(struct kimage *image); +#define arch_crash_hotplug_memory_support arch_crash_hotplug_memory_support +#endif + +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size + #endif /*CONFIG_CRASH_HOTPLUG */ #endif /* CONFIG_PPC64 */ diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index f83866a19e87..802abf580cf0 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int add_tce_mem_ranges(struct crash_mem **mem_ranges); int add_initrd_mem_range(struct crash_mem **mem_ranges); #ifdef CONFIG_PPC_64S_HASH_MMU diff --git a/arch/powe
[PATCH v14 5/6] powerpc: add crash CPU hotplug support
Due to CPU/Memory hotplug or online/offline events the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward CPU hotplug or online/offlice events are referred as CPU/Memory add/remvoe events. The current solution to address the above issue involves monitoring the CPU/memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. To address the aforementioned issue, commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. In the event of a CPU or memory add/remove event, the generic crash hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It then acquires the necessary locks to update the kdump image and invokes the architecture-specific crash hotplug handler, `arch_crash_handle_hotplug_event()`, to update the required kdump image components. This patch adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU add/remove events. Support for memory add/remove events is added in a subsequent patch with the title "powerpc: add crash memory hotplug support." As mentioned earlier, only the elfcorehdr and FDT kdump image components need to be updated in the event of CPU or memory add/remove events. However, the PowerPC architecture crash hotplug handler only updates the FDT to enable crash hotplug support for CPU add/remove events. Here's why. The Elfcorehdr on PowerPC is built with possible CPUs, and thus, it does not need an update on CPU add/remove events. On the other hand, the FDT needs to be updated on CPU add events to include the newly added CPU. If the FDT is not updated and the kernel crashes on a newly added CPU, the kdump kernel will fail to boot due to the unavailability of the crashing CPU in the FDT. During the early boot, it is expected that the boot CPU must be a part of the FDT; otherwise, the kernel will raise a BUG and fail to boot. For more information, refer to commit 36ae37e3436b0 ("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is okay to have an offline CPU in the kdump FDT, no action is taken in case of CPU removal. There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. Few changes have been made to ensure kernel can safely update the kdump FDT for both system calls. For kexec_file_load syscall the kdump image is prepared in kernel. So to support an increasing number of CPUs, the FDT is constructed with extra buffer space to ensure it can accommodate a possible number of CPU nodes. Additionally, a call to fdt_pack (which trims the unused space once the FDT is prepared) is avoided for kdump image loading if this feature is enabled. For the kexec_load syscall, the FDT is updated only if both the KEXEC_UPDATE_FDT and KEXEC_UPDATE_ELFCOREHDR kexec flags are passed to the kernel by the kexec tool. Passing these flags to the kernel indicates that the FDT is built to accommodate possible CPUs, and the FDT segment is not considered for SHA calculation, making it safe to update the FDT. Commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes") added a sysfs interface to indicate userspace (kdump udev rule) that kernel will update the kdump image on CPU hotplug events, so kdump reload can be avoided. Implement arch specific function `arch_crash_hotplug_cpu_support()` to correctly advertise kernel capability to update kdump image. This feature is advertised to userspace when the following conditions are met: 1. Kdump image is loaded using kexec_file_load system call. 2. Kdump image is loaded using kexec_load system and both KEXEC_UPATE_ELFCOREHDR and KEXEC_UPDATE_FDT kexec flags are passed to kernel. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Tho
[PATCH v14 3/6] crash: add a new kexec flag for FDT update
The commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate kernel that it is safe to modify the elfcorehdr of kdump image loaded using kexec_load system call. Similarly, add a new kexec flag, `KEXEC_UPDATE_FDT`, for another kdump component named FDT (Flatten Device Tree). Architectures like PowerPC need to update FDT kdump image component on CPU hotplug events. Kexec tool passing `KEXEC_UPDATE_FDT` will be an indication to kernel that FDT segment is not part of SHA calculation hence it is safe to update it. With the `KEXEC_UPDATE_ELFCOREHDR` and `KEXEC_UPDATE_FDT` kexec flags, crash hotplug support can be added to PowerPC for the kexec_load syscall while maintaining the backward compatibility with older kexec tools that do not have these newly introduced flags. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- include/linux/kexec.h | 6 -- include/uapi/linux/kexec.h | 1 + kernel/kexec.c | 2 ++ 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0f6ea35879ee..bcedb7625b1f 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -319,6 +319,7 @@ struct kimage { #ifdef CONFIG_CRASH_HOTPLUG /* If set, allow changes to elfcorehdr of kexec_load'd image */ unsigned int update_elfcorehdr:1; + unsigned int update_fdt:1; #endif #ifdef ARCH_HAS_KIMAGE_ARCH @@ -396,9 +397,10 @@ bool kexec_load_permitted(int kexec_image_type); /* List of defined/legal kexec flags */ #ifndef CONFIG_KEXEC_JUMP -#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR) +#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | KEXEC_UPDATE_FDT) #else -#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR) +#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR | \ + KEXEC_UPDATE_FDT) #endif /* List of defined/legal kexec file flags */ diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 01766dd839b0..3d5b3d757bed 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -13,6 +13,7 @@ #define KEXEC_ON_CRASH 0x0001 #define KEXEC_PRESERVE_CONTEXT 0x0002 #define KEXEC_UPDATE_ELFCOREHDR0x0004 +#define KEXEC_UPDATE_FDT 0x0008 #define KEXEC_ARCH_MASK0x /* diff --git a/kernel/kexec.c b/kernel/kexec.c index 8f35a5a42af8..97eb151cd931 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -132,6 +132,8 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, #ifdef CONFIG_CRASH_HOTPLUG if (flags & KEXEC_UPDATE_ELFCOREHDR) image->update_elfcorehdr = 1; + if (flags & KEXEC_UPDATE_FDT) + image->update_fdt = 1; #endif ret = machine_kexec_prepare(image); -- 2.41.0
[PATCH v14 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug
load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load" patch to know more about the change. - Fix a couple of typo. - Replace pr_err to pr_info_once to warn user about memory hotplug support. - In crash hotplug handle exit the for loop if FDT segment is found. v3 - Move fdt_index and fdt_index_vaild variables to kimage_arch struct. - Rebase patche on top of https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devol...@oracle.com/ - Fixed warning reported by checpatch script v2: - Use generic hotplug handler introduced by https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devol...@oracle.com/ a significant change from v1. Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org Sourabh Jain (6): crash: forward memory_notify arg to arch crash hotplug handler crash: make CPU and Memory hotplug support reporting flexible crash: add a new kexec flag for FDT update powerpc/kexec: turn some static helper functions public powerpc: add crash CPU hotplug support powerpc: add crash memory hotplug support arch/powerpc/Kconfig| 4 + arch/powerpc/include/asm/kexec.h| 25 ++ arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/core_64.c| 369 arch/powerpc/kexec/elf_64.c | 12 +- arch/powerpc/kexec/file_load_64.c | 211 +++--- arch/powerpc/kexec/ranges.c | 85 ++ arch/x86/include/asm/kexec.h| 10 +- arch/x86/kernel/crash.c | 23 +- include/linux/kexec.h | 21 +- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 37 ++- kernel/kexec.c | 2 + 14 files changed, 605 insertions(+), 200 deletions(-) -- 2.41.0
[PATCH v14 4/6] powerpc/kexec: turn some static helper functions public
Move the functions update_cpus_node and get_crash_memory_ranges from kexec/file_load_64.c to kexec/core_64.c to make these functions usable by other kexec components. get_crash_memory_ranges uses functions defined in ranges.c, so take ranges.c out of CONFIG_KEXEC_FILE. Later in the series, these functions are utilized for in-kernel updates to kdump image during CPU/Memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. There is no intended functional change. Signed-off-by: Sourabh Jain Reviewed-by: Laurent Dufour Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h | 6 ++ arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/core_64.c | 166 ++ arch/powerpc/kexec/file_load_64.c | 162 - 4 files changed, 174 insertions(+), 164 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index e1b43aa12175..562e1bb689da 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -108,6 +108,12 @@ void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); #endif /* CONFIG_PPC_RTAS */ #endif /* CONFIG_CRASH_DUMP */ +#ifdef CONFIG_PPC64 +struct crash_mem; +int update_cpus_node(void *fdt); +int get_crash_memory_ranges(struct crash_mem **mem_ranges); +#endif /* CONFIG_PPC64 */ + #ifdef CONFIG_KEXEC_FILE extern const struct kexec_file_ops kexec_elf64_ops; diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile index 0c2abe7f9908..f2ed5b85b912 100644 --- a/arch/powerpc/kexec/Makefile +++ b/arch/powerpc/kexec/Makefile @@ -3,11 +3,11 @@ # Makefile for the linux kernel. # -obj-y += core.o crash.o core_$(BITS).o +obj-y += core.o crash.o ranges.o core_$(BITS).o obj-$(CONFIG_PPC32)+= relocate_32.o -obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o +obj-$(CONFIG_KEXEC_FILE) += file_load.o file_load_$(BITS).o elf_$(BITS).o # Disable GCOV, KCOV & sanitizers in odd or sensitive code GCOV_PROFILE_core_$(BITS).o := n diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 0bee7ca9a77c..9966b51d9aa8 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include #include @@ -30,6 +32,8 @@ #include #include #include +#include +#include int machine_kexec_prepare(struct kimage *image) { @@ -377,6 +381,168 @@ void default_machine_kexec(struct kimage *image) /* NOTREACHED */ } +/** + * get_crash_memory_ranges - Get crash memory ranges. This list includes + * first/crashing kernel's memory regions that + * would be exported via an elfcore. + * @mem_ranges: Range list to add the memory ranges to. + * + * Returns 0 on success, negative errno on error. + */ +int get_crash_memory_ranges(struct crash_mem **mem_ranges) +{ + phys_addr_t base, end; + struct crash_mem *tmem; + u64 i; + int ret; + + for_each_mem_range(i, , ) { + u64 size = end - base; + + /* Skip backup memory region, which needs a separate entry */ + if (base == BACKUP_SRC_START) { + if (size > BACKUP_SRC_SIZE) { + base = BACKUP_SRC_END + 1; + size -= BACKUP_SRC_SIZE; + } else + continue; + } + + ret = add_mem_range(mem_ranges, base, size); + if (ret) + goto out; + + /* Try merging adjacent ranges before reallocation attempt */ + if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges) + sort_memory_ranges(*mem_ranges, true); + } + + /* Reallocate memory ranges if there is no space to split ranges */ + tmem = *mem_ranges; + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { + tmem = realloc_mem_ranges(mem_ranges); + if (!tmem) + goto out; + } + + /* Exclude crashkernel region */ + ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + /* +* FIXME: For now, stay
[PATCH v14 1/6] crash: forward memory_notify arg to arch crash hotplug handler
In the event of memory hotplug or online/offline events, the crash memory hotplug notifier `crash_memhp_notifier()` receives a `memory_notify` object but doesn't forward that object to the generic and architecture-specific crash hotplug handler. The `memory_notify` object contains the starting PFN (Page Frame Number) and the number of pages in the hot-removed memory. This information is necessary for architectures like PowerPC to update/recreate the kdump image, specifically `elfcorehdr`. So update the function signature of `crash_handle_hotplug_event()` and `arch_crash_handle_hotplug_event()` to accept the `memory_notify` object as an argument from crash memory hotplug notifier. Since no such object is available in the case of CPU hotplug event, the crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the crash hotplug handler. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 2 +- arch/x86/kernel/crash.c | 3 ++- include/linux/kexec.h| 2 +- kernel/crash_core.c | 14 +++--- 4 files changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index c9f6a6c5de3c..9bb6607e864e 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -208,7 +208,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image); extern void kdump_nmi_shootdown_cpus(void); #ifdef CONFIG_CRASH_HOTPLUG -void arch_crash_handle_hotplug_event(struct kimage *image); +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index c92d88680dbf..0d7b2657beb6 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -428,10 +428,11 @@ unsigned int arch_crash_get_elfcorehdr_size(void) /** * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes * @image: a pointer to kexec_crash_image + * @arg: struct memory_notify handler for memory hotplug case and NULL for CPU hotplug case. * * Prepare the new elfcorehdr and replace the existing elfcorehdr. */ -void arch_crash_handle_hotplug_event(struct kimage *image) +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { void *elfbuf = NULL, *old_elfcorehdr; unsigned long nr_mem_ranges; diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 8227455192b7..ee28c09a7fb0 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -483,7 +483,7 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { #endif #ifndef arch_crash_handle_hotplug_event -static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } +static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } #endif int crash_check_update_elfcorehdr(void); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index efe87d501c8c..b9190265fe52 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -935,7 +935,7 @@ int crash_check_update_elfcorehdr(void) * list of segments it checks (since the elfcorehdr changes and thus * would require an update to purgatory itself to update the digest). */ -static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) +static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg) { struct kimage *image; @@ -997,7 +997,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) image->hp_action = hp_action; /* Now invoke arch-specific update handler */ - arch_crash_handle_hotplug_event(image); + arch_crash_handle_hotplug_event(image, arg); /* No longer handling a hotplug event */ image->hp_action = KEXEC_CRASH_HP_NONE; @@ -1013,17 +1013,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) crash_hotplug_unlock(); } -static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v) +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg) { switch (val) { case MEM_ONLINE: crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, - KEXEC_CRASH_HP_INVALID_CPU); + KEXEC_CRASH_HP_INVALID_CPU, arg);
[PATCH v14 2/6] crash: make CPU and Memory hotplug support reporting flexible
Architectures' specific functions `arch_crash_hotplug_cpu_support()` and `arch_crash_hotplug_memory_support()` advertise the kernel's capability to update the kdump image on CPU and Memory hotplug events to userspace via the sysfs interface. These architecture-specific functions need to access various attributes of the `kexec_crash_image` object to determine whether the kernel can update the kdump image and advertise this information to userspace accordingly. As the architecture-specific code is not exposed to the APIs required to acquire the lock for accessing the `kexec_crash_image` object, it calls a generic function, `crash_check_update_elfcorehdr()`, to determine whether the kernel can update the kdump image or not. The lack of access to the `kexec_crash_image` object in architecture-specific code restricts architectures from performing additional architecture-specific checks required to determine if the kdump image is updatable or not. For instance, on PowerPC, the kernel can update the kdump image only if both the elfcorehdr and FDT are marked as updatable for the `kexec_load` system call. So define two generic functions, `crash_hotplug_cpu_support()` and `crash_hotplug_memory_support()`, which are called when userspace attempts to read the crash CPU and Memory hotplug support via the sysfs interface. These functions take the necessary locks needed to access the `kexec_crash_image` object and then forward it to the architecture-specific handler to do the rest. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/x86/include/asm/kexec.h | 8 arch/x86/kernel/crash.c | 20 +++- include/linux/kexec.h| 13 +++-- kernel/crash_core.c | 23 +-- 4 files changed, 43 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 9bb6607e864e..5c88d27b086d 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -212,13 +212,13 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support +int arch_crash_hotplug_cpu_support(struct kimage *image); +#define arch_crash_hotplug_cpu_support arch_crash_hotplug_cpu_support #endif #ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support +int arch_crash_hotplug_memory_support(struct kimage *image); +#define arch_crash_hotplug_memory_support arch_crash_hotplug_memory_support #endif unsigned int arch_crash_get_elfcorehdr_size(void); diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 0d7b2657beb6..ad5941665589 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -398,18 +398,28 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ +#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG) +static int crash_hotplug_support(struct kimage *image) +{ + if (image->file_mode) + return 1; + + return image->update_elfcorehdr; +} +#endif + #ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +/* These functions provide the value for the sysfs crash_hotplug nodes */ +int arch_crash_hotplug_cpu_support(struct kimage *image) { - return crash_check_update_elfcorehdr(); + return crash_hotplug_support(image); } #endif #ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) +int arch_crash_hotplug_memory_support(struct kimage *image) { - return crash_check_update_elfcorehdr(); + return crash_hotplug_support(image); } #endif diff --git a/include/linux/kexec.h b/include/linux/kexec.h index ee28c09a7fb0..0f6ea35879ee 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -486,16 +486,17 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } #endif -int crash_check_update_elfcorehdr(void); - -#ifndef crash_hotplug_cpu_support -static inline int crash_hotplug_cpu_support(void) { return 0; } +#ifndef arch_crash_hotplug_cpu_support +s
[PATCH v6 3/3] Documentation/powerpc: update fadump implementation details
The patch titled ("powerpc: make fadump resilient with memory add/remove events") has made significant changes to the implementation of fadump, particularly on elfcorehdr creation and fadump crash info header structure. Therefore, updating the fadump implementation documentation to reflect those changes. Following updates are done to firmware assisted dump documentation: 1. The elfcorehdr is no longer stored after fadump HDR in the reserved dump area. Instead, the second kernel dynamically allocates memory for the elfcorehdr within the address range from 0 to the boot memory size. Therefore, update figures 1 and 2 of Memory Reservation during the first and second kernels to reflect this change. 2. A version field has been added to the fadump header to manage the future changes to fadump crash info header structure without changing the fadump header magic number in the future. Therefore, remove the corresponding TODO from the document. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- .../arch/powerpc/firmware-assisted-dump.rst | 91 +-- 1 file changed, 42 insertions(+), 49 deletions(-) diff --git a/Documentation/arch/powerpc/firmware-assisted-dump.rst b/Documentation/arch/powerpc/firmware-assisted-dump.rst index e363fc48529a..7e37aadd1f77 100644 --- a/Documentation/arch/powerpc/firmware-assisted-dump.rst +++ b/Documentation/arch/powerpc/firmware-assisted-dump.rst @@ -134,12 +134,12 @@ that are run. If there is dump data, then the memory is held. If there is no waiting dump data, then only the memory required to -hold CPU state, HPTE region, boot memory dump, FADump header and -elfcore header, is usually reserved at an offset greater than boot -memory size (see Fig. 1). This area is *not* released: this region -will be kept permanently reserved, so that it can act as a receptacle -for a copy of the boot memory content in addition to CPU state and -HPTE region, in the case a crash does occur. +hold CPU state, HPTE region, boot memory dump, and FADump header is +usually reserved at an offset greater than boot memory size (see Fig. 1). +This area is *not* released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy of the boot +memory content in addition to CPU state and HPTE region, in the case +a crash does occur. Since this reserved memory area is used only after the system crash, there is no point in blocking this significant chunk of memory from @@ -153,22 +153,22 @@ that were present in CMA region:: o Memory Reservation during first kernel - Low memory Top of memory - 0boot memory size |<--- Reserved dump area --->| | - | | |Permanent Reservation | | - V V || V - +---+-/ /---+---++---+-+-++--+ - | | |///|| DUMP | HDR | ELF || | - +---+-/ /---+---++---+-+-++--+ -| ^^ ^ ^ ^ -| || | | | -\ CPU HPTE / | | - -- | | - Boot memory content gets transferred| | - to reserved area by firmware at the | | - time of crash. | | - FADump Header | - (meta area)| + Low memory Top of memory + 0boot memory size |<-- Reserved dump area ->| | + | | | Permanent Reservation | | + V V | | V + +---+-/ /---+---++---+---++-+ + | | |///||DUMP | HDR || | + +---+-/ /---+---++---+---++-+ +| ^^ ^ ^ ^ +| || | | | +\ CPU HPTE / | | + | | + Boot memory content gets transferred | | + to reserved area by firmware at the | | + time of crash. | | + FADump Header | +(meta area) | | | Metadata: This area holds a metadata structure whose @@ -186,13 +186,2
[PATCH v6 2/3] powerpc/fadump: add hotplug_ready sysfs interface
The elfcorehdr describes the CPUs and memory of the crashed kernel to the kernel that captures the dump, known as the second or fadump kernel. The elfcorehdr needs to be updated if the system's memory changes due to memory hotplug or online/offline events. Currently, memory hotplug events are monitored in userspace by udev rules, and fadump is re-registered, which recreates the elfcorehdr with the latest available memory in the system. However, the previous patch ("powerpc: make fadump resilient with memory add/remove events") moved the creation of elfcorehdr to the second or fadump kernel. This eliminates the need to regenerate the elfcorehdr during memory hotplug or online/offline events. Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let userspace know that fadump re-registration is not required for memory add/remove events. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- Documentation/ABI/testing/sysfs-kernel-fadump | 11 +++ arch/powerpc/kernel/fadump.c | 14 ++ 2 files changed, 25 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump b/Documentation/ABI/testing/sysfs-kernel-fadump index 8f7a64a81783..971934b2891e 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump +++ b/Documentation/ABI/testing/sysfs-kernel-fadump @@ -38,3 +38,14 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Provide information about the amount of memory reserved by FADump to save the crash dump in bytes. + +What: /sys/kernel/fadump/hotplug_ready +Date: Sep 2023 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read only + Kdump udev rule re-registers fadump on memory add/remove events, + primarily to update the elfcorehdr. This sysfs indicates the + kdump udev rule that fadump re-registration is not required on + memory add/remove events because elfcorehdr is now prepared in + the second/fadump kernel. +User: kexec-tools diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index eb9132538268..a55dd9bf754c 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1455,6 +1455,18 @@ static ssize_t enabled_show(struct kobject *kobj, return sprintf(buf, "%d\n", fw_dump.fadump_enabled); } +/* + * /sys/kernel/fadump/hotplug_ready sysfs node returns 1, which inidcates + * to usersapce that fadump re-registration is not required on memory + * hotplug events. + */ +static ssize_t hotplug_ready_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", 1); +} + static ssize_t mem_reserved_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -1527,11 +1539,13 @@ static struct kobj_attribute release_attr = __ATTR_WO(release_mem); static struct kobj_attribute enable_attr = __ATTR_RO(enabled); static struct kobj_attribute register_attr = __ATTR_RW(registered); static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved); +static struct kobj_attribute hotplug_ready_attr = __ATTR_RO(hotplug_ready); static struct attribute *fadump_attrs[] = { _attr.attr, _attr.attr, _reserved_attr.attr, + _ready_attr.attr, NULL, }; -- 2.41.0
[PATCH v6 1/3] powerpc: make fadump resilient with memory add/remove events
the dump and prints relevant error message on console. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- arch/powerpc/include/asm/fadump-internal.h | 31 +- arch/powerpc/kernel/fadump.c | 355 +++ arch/powerpc/platforms/powernv/opal-fadump.c | 18 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 23 +- 4 files changed, 242 insertions(+), 185 deletions(-) diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index 27f9e11eda28..a632e9708610 100644 --- a/arch/powerpc/include/asm/fadump-internal.h +++ b/arch/powerpc/include/asm/fadump-internal.h @@ -42,13 +42,40 @@ static inline u64 fadump_str_to_u64(const char *str) #define FADUMP_CPU_UNKNOWN (~((u32)0)) -#define FADUMP_CRASH_INFO_MAGICfadump_str_to_u64("FADMPINF") +/* + * The introduction of new fields in the fadump crash info header has + * led to a change in the magic key from `FADMPINF` to `FADMPSIG` for + * identifying a kernel crash from an old kernel. + * + * To prevent the need for further changes to the magic number in the + * event of future modifications to the fadump crash info header, a + * version field has been introduced to track the fadump crash info + * header version. + * + * Consider a few points before adding new members to the fadump crash info + * header structure: + * + * - Append new members; avoid adding them in between. + * - Non-primitive members should have a size member as well. + * - For every change in the fadump header, increment the + *fadump header version. This helps the updated kernel decide how to + *handle kernel dumps from older kernels. + */ +#define FADUMP_CRASH_INFO_MAGIC_OLDfadump_str_to_u64("FADMPINF") +#define FADUMP_CRASH_INFO_MAGICfadump_str_to_u64("FADMPSIG") +#define FADUMP_HEADER_VERSION 1 /* fadump crash info structure */ struct fadump_crash_info_header { u64 magic_number; - u64 elfcorehdr_addr; + u32 version; u32 crashing_cpu; + u64 elfcorehdr_addr; + u64 elfcorehdr_size; + u64 vmcoreinfo_raddr; + u64 vmcoreinfo_size; + u32 pt_regs_sz; + u32 cpu_mask_sz; struct pt_regs regs; struct cpumask cpu_mask; }; diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index d14eda1e8589..eb9132538268 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -53,8 +53,6 @@ static struct kobject *fadump_kobj; static atomic_t cpus_in_fadump; static DEFINE_MUTEX(fadump_mutex); -static struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0, false }; - #define RESERVED_RNGS_SZ 16384 /* 16K - 128 entries */ #define RESERVED_RNGS_CNT (RESERVED_RNGS_SZ / \ sizeof(struct fadump_memory_range)) @@ -373,12 +371,6 @@ static unsigned long __init get_fadump_area_size(void) size = PAGE_ALIGN(size); size += fw_dump.boot_memory_size; size += sizeof(struct fadump_crash_info_header); - size += sizeof(struct elfhdr); /* ELF core header.*/ - size += sizeof(struct elf_phdr); /* place holder for cpu notes */ - /* Program headers for crash memory regions. */ - size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2); - - size = PAGE_ALIGN(size); /* This is to hold kernel metadata on platforms that support it */ size += (fw_dump.ops->fadump_get_metadata_size ? @@ -931,36 +923,6 @@ static inline int fadump_add_mem_range(struct fadump_mrange_info *mrange_info, return 0; } -static int fadump_exclude_reserved_area(u64 start, u64 end) -{ - u64 ra_start, ra_end; - int ret = 0; - - ra_start = fw_dump.reserve_dump_area_start; - ra_end = ra_start + fw_dump.reserve_dump_area_size; - - if ((ra_start < end) && (ra_end > start)) { - if ((start < ra_start) && (end > ra_end)) { - ret = fadump_add_mem_range(_mrange_info, - start, ra_start); - if (ret) - return ret; - - ret = fadump_add_mem_range(_mrange_info, - ra_end, end); - } else if (start < ra_start) { - ret = fadump_add_mem_range(_mrange_info, - start, ra_start); - } else if (ra_end < end) { - ret = fadump_add_mem_range(_mrange_info, - ra_end, end); - }
[PATCH v6 0/3] powerpc: make fadump resilient with memory add/remove events
Problem: Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the cpus and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the patch series. Existing solution: == Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. Challenges with existing solution: == 1. Performing bulk memory add/remove with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a large wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. Memory for elfcorehdr is allocated based on the memblock regions available during first kernel early boot and it remains fixed thereafter. However, if the elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Proposed solution: == Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. To know more about elfcorehdr creation in the fadump kernel, refer to the first patch in this series. The second patch includes a new sysfs interface that tells userspace that fadump re-registration isn't needed for memory add/remove events. note that userspace changes do not need to be in sync with kernel changes; they can roll out independently. Since there are significant changes in the fadump implementation, the third patch updates the fadump documentation to reflect the changes made in this patch series. Kernel tree rebased on 6.7.0-rc4 with patch series applied: = https://github.com/sourabhjains/linux/tree/fadump-mem-hotplug-v6 Userspace changes: == To realize this feature, one must update the kdump udev rules to prevent fadump re-registration during memory add/remove events. On rhel apply the following changes to file /usr/lib/udev/rules.d/98-kexec.rules -run+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; /usr/bin/systemd-run --quiet --no-block /usr/lib/udev/kdump-udev-throttler'" +# don't re-register fadump if the value of the node +# /sys/kernel/fadump/hotplug_ready is 1. + +run+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; ! test -f /sys/kernel/fadump_enabled || cat /sys/kernel/fadump_enabled | grep 0 || ! test -f /sys/kernel/fadump/hotplug_ready || cat /sys/kernel/fadump/hotplug_ready | grep 0 || exit 0; /usr/bin/systemd-run --quiet --no-block /usr/lib/udev/kdump-udev-throttler'" Changelog: == v6: 8 Dec 2023 - Add size fields for `pt_regs` and `cpumask` in the fadump header structure - Don't process the dump if the size of `pt_regs` and `cpu_mask` is not same in the crashed and fadump kernel - Include an additional check for endianness mismatch when the magic number doesn't match, to print the relevant error message - Don't process the dump if the fadump header contains an old magic number - Rebased it to 6.7.0-rc4 v5: 29 Oct 2023 https://lore.kernel.org/all/20231029124548.12198-1-sourabhj...@linux.ibm.com/ - Fix a comment on the first patch v4: 21 Oct 2023 https://lore.kernel.org/all/20231021181733.204311-1-sourabhj...@linux.ibm.com/ - Fix a build warning about type casting v3: 9 Oct 2023 https://lore.kernel.org/all/20231009041953.36139-1-sourabhj...@linux.ibm.com/ - Assign physical address of elfcorehdr to fdh->elfcorehdr_addr - Rename a variable, boot_mem_dest_addr -> boot_mem_dest_offset v2: 25 Sep 2023 https://lore.kernel.org/all/20230925051214.678957-1-sourabhj...@linux.ibm.com/ - Fixed a few indentation issues reported by the checkpatch script. - Rebased it to 6.6.0-rc3 v1: 17 Sep 2023 https://lore.kernel.org/all/20230917080225.561627-1-sourabhj...@linux.ibm.com/ Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao Sourabh Jain (3): powerpc: make fadump resilient with memory add/remove events powerpc/fadump: add hotplug_ready sysfs interface Documentat
[PATCH v13 6/6] powerpc: add crash memory hotplug support
Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot removal, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if both the KEXEC_UPDATE_FDT and KEXEC_UPDATE_ELFCOREHDR kexec flags are passed to the kernel by the kexec tool. Passing these flags to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update it. Commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes") added a sysfs interface to indicate userspace (kdump udev rule) that kernel will update the kdump image on memory hotplug events, so kdump reload can be avoided. Implement arch specific function `arch_crash_hotplug_memory_support()` to correctly advertise kernel capability to update kdump image. This feature is advertised to usersapce when following conditions are met: 1. Kdump image is loaded using kexec_file_load system call. 2. Kdump image is loaded using kexec_load system and both KEXEC_UPATE_ELFCOREHDR and KEXEC_UPDATE_FDT kexec flags are passed to kernel. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- arch/powerpc/include/asm/kexec.h| 8 ++ arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/core_64.c| 125 ++-- arch/powerpc/kexec/file_load_64.c | 34 ++- arch/powerpc/kexec/ranges.c | 85 5 files changed, 244 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 7823ab10d323..c8d6cfda523c 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -122,6 +122,14 @@ int arch_crash_hotplug_cpu_support(struct kimage *image); #define arch_crash_hotplug_cpu_support arch_crash_hotplug_cpu_support #endif +#ifdef CONFIG_MEMORY_HOTPLUG +int arch_crash_hotplug_memory_support(struct kimage *image); +#define arch_crash_hotplug_memory_support arch_crash_hotplug_memory_support +#endif + +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size + #endif /*CONFIG_CRASH_HOTPLUG */ #endif /* CONFIG_PPC64 */ diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index f83866a19e87..802abf580cf0 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int add_tce_mem_ranges(struct crash_mem **mem_ranges); int add_initrd_mem_range(struct crash_mem **mem_ranges); #ifdef CONFIG_PPC_64S_HASH_MMU diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/co
[PATCH v13 5/6] powerpc: add crash CPU hotplug support
Due to CPU/Memory hotplug or online/offline events the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward CPU hotplug or online/offlice events are referred as CPU/Memory add/remvoe events. The current solution to address the above issue involves monitoring the CPU/memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. To address the aforementioned issue, commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. In the event of a CPU or memory add/remove event, the generic crash hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It then acquires the necessary locks to update the kdump image and invokes the architecture-specific crash hotplug handler, `arch_crash_handle_hotplug_event()`, to update the required kdump image components. This patch adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU add/remove events. Support for memory add/remove events is added in a subsequent patch with the title "powerpc: add crash memory hotplug support." As mentioned earlier, only the elfcorehdr and FDT kdump image components need to be updated in the event of CPU or memory add/remove events. However, the PowerPC architecture crash hotplug handler only updates the FDT to enable crash hotplug support for CPU add/remove events. Here's why. The Elfcorehdr on PowerPC is built with possible CPUs, and thus, it does not need an update on CPU add/remove events. On the other hand, the FDT needs to be updated on CPU add events to include the newly added CPU. If the FDT is not updated and the kernel crashes on a newly added CPU, the kdump kernel will fail to boot due to the unavailability of the crashing CPU in the FDT. During the early boot, it is expected that the boot CPU must be a part of the FDT; otherwise, the kernel will raise a BUG and fail to boot. For more information, refer to commit 36ae37e3436b0 ("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is okay to have an offline CPU in the kdump FDT, no action is taken in case of CPU removal. There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. Few changes have been made to ensure kernel can safely update the kdump FDT for both system calls. For kexec_file_load syscall the kdump image is prepared in kernel. So to support an increasing number of CPUs, the FDT is constructed with extra buffer space to ensure it can accommodate a possible number of CPU nodes. Additionally, a call to fdt_pack (which trims the unused space once the FDT is prepared) is avoided for kdump image loading if this feature is enabled. For the kexec_load syscall, the FDT is updated only if both the KEXEC_UPDATE_FDT and KEXEC_UPDATE_ELFCOREHDR kexec flags are passed to the kernel by the kexec tool. Passing these flags to the kernel indicates that the FDT is built to accommodate possible CPUs, and the FDT segment is not considered for SHA calculation, making it safe to update the FDT. Commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes") added a sysfs interface to indicate userspace (kdump udev rule) that kernel will update the kdump image on CPU hotplug events, so kdump reload can be avoided. Implement arch specific function `arch_crash_hotplug_cpu_support()` to correctly advertise kernel capability to update kdump image. This feature is advertised to userspace when the following conditions are met: 1. Kdump image is loaded using kexec_file_load system call. 2. Kdump image is loaded using kexec_load system and both KEXEC_UPATE_ELFCOREHDR and KEXEC_UPDATE_FDT kexec flags are passed to kernel. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Cc: Akhil Raj Cc: Andrew Morton Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Hari Bathini Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Oscar Salvador Cc: Thomas Gleixner Cc: Val