[PATCH 19/57] docs: kdump: convert it to ReST
Convert kdump documentation to ReST and add it to the user faced manual, as the documents are mainly focused on sysadmins that would be enabling kdump. Note: the vmcoreinfo.rst has one very long title for sub-sections. I opted to break this one, in order to make it easier to display in html. Signed-off-by: Mauro Carvalho Chehab --- Documentation/kdump/kdump.txt | 131 + Documentation/kdump/vmcoreinfo.txt | 59 ++--- 2 files changed, 104 insertions(+), 86 deletions(-) diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index 51814450a7f8..1da2d7b765f6 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -71,9 +71,8 @@ This is a symlink to the latest version. The latest kexec-tools git tree is available at: -git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git -and -http://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git +- git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git +- http://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git There is also a gitweb interface available at http://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git @@ -81,25 +80,25 @@ http://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git More information about kexec-tools can be found at http://horms.net/projects/kexec/ -3) Unpack the tarball with the tar command, as follows: +3) Unpack the tarball with the tar command, as follows:: - tar xvpzf kexec-tools.tar.gz + tar xvpzf kexec-tools.tar.gz -4) Change to the kexec-tools directory, as follows: +4) Change to the kexec-tools directory, as follows:: - cd kexec-tools-VERSION + cd kexec-tools-VERSION -5) Configure the package, as follows: +5) Configure the package, as follows:: - ./configure + ./configure -6) Compile the package, as follows: +6) Compile the package, as follows:: - make + make -7) Install the package, as follows: +7) Install the package, as follows:: - make install + make install Build the system and dump-capture kernels @@ -126,25 +125,25 @@ dump-capture kernels for enabling kdump support. System kernel config options -1) Enable "kexec system call" in "Processor type and features." +1) Enable "kexec system call" in "Processor type and features.":: - CONFIG_KEXEC=y + CONFIG_KEXEC=y 2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo - filesystems." This is usually enabled by default. + filesystems." This is usually enabled by default:: - CONFIG_SYSFS=y + CONFIG_SYSFS=y Note that "sysfs file system support" might not appear in the "Pseudo filesystems" menu if "Configure standard kernel features (for small systems)" is not enabled in "General Setup." In this case, check the - .config file itself to ensure that sysfs is turned on, as follows: + .config file itself to ensure that sysfs is turned on, as follows:: - grep 'CONFIG_SYSFS' .config + grep 'CONFIG_SYSFS' .config -3) Enable "Compile the kernel with debug info" in "Kernel hacking." +3) Enable "Compile the kernel with debug info" in "Kernel hacking.":: - CONFIG_DEBUG_INFO=Y + CONFIG_DEBUG_INFO=Y This causes the kernel to be built with debug symbols. The dump analysis tools require a vmlinux with debug symbols in order to read @@ -154,29 +153,32 @@ Dump-capture kernel config options (Arch Independent) - 1) Enable "kernel crash dumps" support under "Processor type and - features": + features":: - CONFIG_CRASH_DUMP=y + CONFIG_CRASH_DUMP=y -2) Enable "/proc/vmcore support" under "Filesystems" -> "Pseudo filesystems". +2) Enable "/proc/vmcore support" under "Filesystems" -> "Pseudo filesystems":: + + CONFIG_PROC_VMCORE=y - CONFIG_PROC_VMCORE=y (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.) Dump-capture kernel config options (Arch Dependent, i386 and x86_64) 1) On i386, enable high memory support under "Processor type and - features": + features":: - CONFIG_HIGHMEM64G=y - or - CONFIG_HIGHMEM4G + CONFIG_HIGHMEM64G=y + + or:: + + CONFIG_HIGHMEM4G 2) On i386 and x86_64, disable symmetric multi-processing support - under "Processor type and features": + under "Processor type and features":: - CONFIG_SMP=n + CONFIG_SMP=n (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line when loading the dump-capture kernel, see section "Load the Dump-capture @@ -184,9 +186,9 @@ Dump-capture kernel config options (Arch Dependent, i386 and x86_64) 3) If one wants to build and use a relocatable kernel, Enable "Build a relocatable kernel" support under "Processor type and - features" + features":: - CONFIG_RELOCATABLE=y + CONFIG
[PATCH 00/57] Convert files to ReST
This series convert lots of files to be properly parsed by Sphinx as ReST files. As it touches on lot of stuff, the series is based on linux-next. I have a separate patch series with do the actual rename and adjustment of references. I opted to submit this first, as it sounds easier to merge this way, as each subsystem maintainer can apply the conversion directly on their trees (or at docs tree), avoiding merge conflects. Both this series and the next steps are on my devel git tree, at: https://git.linuxtv.org/mchehab/experimental.git/log/?h=all_with_indexes-v3 The final output in html can be seen at: https://www.infradead.org/~mchehab/rst_conversion/ Mauro Carvalho Chehab (57): docs: trace: fix some Sphinx warnings docs: acpi: convert text files to ReST docs: aoe: convert text files to ReST docs: arm64: convert documentation to ReST format docs: cdrom/cdrom-standard.tex: convert from LaTeX to ReST docs: cdrom: convert remaining files to ReST docs: cgroup-v1: convert to ReST file format docs: cgroup-v1/blkio-controller.rst: add a note about CFQ scheduler docs: cpu-freq: convert files to ReST docs: device-mapper: convert it to ReST format docs: extcon: move it to acpi dir and convert it to ReST docs: fault-injection: convert it to ReST format docs: fb: convert documentation to ReST format docs: fpga: convert it to ReST docs: gpio: convert it to ReST docs: ide: convert it to ReST format docs: infiniband: convert it to ReST format docs: kbuild: convert it to ReST output docs: kdump: convert it to ReST docs: livepatch: convert it to ReST format docs: locking: convert docs to ReST format docs: mic: convert it to ReST format docs: netlabel: convert it to ReST docs: pcmcia: convert it to ReST format docs: power: convert docs to ReST docs: powerpc: convert docs to ReST docs: pps/pps.txt convert it to ReST and move to API book docs: ptp.txt: convert to ReST and move to driver-api docs: riscv: convert it to ReST format docs: s390: Debugging390.txt: convert table to ascii artwork docs: s390: convert text files to ReST format s390: include/asm/debug.h add kerneldoc markups docs: serial: convert it to ReST format docs: target: convert it to ReST format docs: timers: convert documentation to ReST docs: usb: convert documents to ReST docs: watchdog: convert documents to ReST format docs: x86: convert text files to ReST docs: xilinx: convert eemi.txt to ReST docs: scheduler: convert files to ReST docs: EDID/HOWTO.txt: convert to ReST and move to kernel-API docs: connector.txt: convert to ReST docs: lcd-panel-cgram.txt convert it to ReST and move to admin-guide docs: lp855x-driver.txt: convert to ReST and move to kernel-api docs: m68k: convert it to ReST file format and add to arch bookset docs: cma/debugfs.txt: convert to ReST and move to admin-guide/mm docs: console.txt: convert to ReST format docs: pti_intel_mid.txt: convert to ReST docs: early-userspace: convert docs to ReST docs: driver-model: convert it to ReST format docs: arm: convert text files to ReST format docs: memory-devices: convert ti-emif.txt to ReST format docs: xen-tpmfront.txt: convert the file to ReST format docs: bus-devices: ti-gpmc.txt: convert it to ReST docs: nvmem: convert file to ReST format docs: phy: convert samsung-usb2.txt to ReST format docs: Prepare files to be renamed to *.rst Documentation/EDID/HOWTO.txt | 29 +- Documentation/acpi/DSD-properties-rules.txt |4 +- Documentation/acpi/acpi-lid.txt | 37 +- Documentation/acpi/aml-debugger.txt | 31 +- Documentation/acpi/apei/einj.txt | 59 +- Documentation/acpi/apei/output_format.txt | 247 +- Documentation/acpi/cppc_sysfs.txt | 52 +- Documentation/acpi/debug.txt | 20 +- .../drivers/extcon-intel-int3496.txt} | 14 +- .../acpi/dsd/data-node-references.txt | 11 +- Documentation/acpi/dsd/graph.txt | 24 +- Documentation/acpi/dsd/leds.txt | 18 +- Documentation/acpi/dsdt-override.txt |4 +- Documentation/acpi/enumeration.txt| 42 +- Documentation/acpi/gpio-properties.txt| 42 +- Documentation/acpi/i2c-muxes.txt | 21 +- Documentation/acpi/initrd_table_override.txt | 90 +- Documentation/acpi/linuxized-acpica.txt | 58 +- Documentation/acpi/lpit.txt |8 +- Documentation/acpi/method-customizing.txt | 48 +- Documentation/acpi/method-tracing.txt | 132 +- Documentation/acpi/namespace.txt | 323 +- Documentation/acpi/osi.txt|3 +- Documentation/acpi/scan_handlers.txt |9 +- Documentation/acpi/ssdt-overlays.txt | 128 +- Documentation/acpi/video_extension.txt| 16 +- Documentation/aoe/aoe.txt | 63 +- Documentation/a
Re: [PATCH v4 3/5] memblock: add memblock_cap_memory_ranges for multiple ranges
Hi Mike, On 2019/4/16 3:09, Mike Rapoport wrote: > Hi, > > On Mon, Apr 15, 2019 at 06:57:23PM +0800, Chen Zhou wrote: >> The memblock_cap_memory_range() removes all the memory except the >> range passed to it. Extend this function to receive memblock_type >> with the regions that should be kept. >> >> Enable this function in arm64 for reservation of multiple regions >> for the crash kernel. >> >> Signed-off-by: Chen Zhou >> Signed-off-by: Mike Rapoport > > I didn't work on this version, please drop the signed-off. Sorry about this. I should ask you firstly before doing it this way. I will drop it. remove_size); >> +} >> + >> +memblock_remove_range(&memblock.reserved, >> +regs[nr - 1].base + regs[nr - 1].size, PHYS_ADDR_MAX); >> +} >> + > > I've double-checked and I see no problem with using > for_each_mem_range_rev() iterators for removing some ranges. And with them > this functions becomes much clearer and more efficient. > > Can you please check if the below patch works for you? > >>From e25e6c9cd94a01abac124deacc66e5d258fdbf7c Mon Sep 17 00:00:00 2001 > From: Mike Rapoport > Date: Wed, 10 Apr 2019 16:02:32 +0300 > Subject: [PATCH] memblock: extend memblock_cap_memory_range to multiple ranges > > The memblock_cap_memory_range() removes all the memory except the range > passed to it. Extend this function to receive an array of memblock_regions > that should be kept. This allows switching to simple iteration over > memblock arrays with 'for_each_mem_range_rev' to remove the unneeded memory. > > Enable use of this function in arm64 for reservation of multiple regions for > the crash kernel. > > Signed-off-by: Mike Rapoport > --- > arch/arm64/mm/init.c | 34 -- > include/linux/memblock.h | 2 +- > mm/memblock.c| 44 > 3 files changed, 45 insertions(+), 35 deletions(-) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 6bc1350..8665d29 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -64,6 +64,10 @@ EXPORT_SYMBOL(memstart_addr); > phys_addr_t arm64_dma_phys_limit __ro_after_init; > > #ifdef CONFIG_KEXEC_CORE > + > +/* at most two crash kernel regions, low_region and high_region */ > +#define CRASH_MAX_USABLE_RANGES 2 > + > /* > * reserve_crashkernel() - reserves memory for crash kernel > * > @@ -280,9 +284,9 @@ early_param("mem", early_mem); > static int __init early_init_dt_scan_usablemem(unsigned long node, > const char *uname, int depth, void *data) > { > - struct memblock_region *usablemem = data; > - const __be32 *reg; > - int len; > + struct memblock_type *usablemem = data; > + const __be32 *reg, *endp; > + int len, nr = 0; > > if (depth != 1 || strcmp(uname, "chosen") != 0) > return 0; > @@ -291,22 +295,32 @@ static int __init early_init_dt_scan_usablemem(unsigned > long node, > if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells))) > return 1; > > - usablemem->base = dt_mem_next_cell(dt_root_addr_cells, ®); > - usablemem->size = dt_mem_next_cell(dt_root_size_cells, ®); > + endp = reg + (len / sizeof(__be32)); > + while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) { > + unsigned long base = dt_mem_next_cell(dt_root_addr_cells, ®); > + unsigned long size = dt_mem_next_cell(dt_root_size_cells, ®); > > + if (memblock_add_range(usablemem, base, size, NUMA_NO_NODE, > +MEMBLOCK_NONE)) > + return 0; > + if (++nr >= CRASH_MAX_USABLE_RANGES) > + break; > + } > return 1; > } > > static void __init fdt_enforce_memory_region(void) > { > - struct memblock_region reg = { > - .size = 0, > + struct memblock_region usable_regions[CRASH_MAX_USABLE_RANGES]; > + struct memblock_type usablemem = { > + .max = CRASH_MAX_USABLE_RANGES, > + .regions = usable_regions, > }; > > - of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > + of_scan_flat_dt(early_init_dt_scan_usablemem, &usablemem); > > - if (reg.size) > - memblock_cap_memory_range(reg.base, reg.size); > + if (usablemem.cnt) > + memblock_cap_memory_ranges(usablemem.regions, usablemem.cnt); > } > > void __init arm64_memblock_init(void) > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index 294d5d8..f5c029b 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -404,7 +404,7 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn); > phys_addr_t memblock_start_of_DRAM(void); > phys_addr_t memblock_end_of_DRAM(void); > void memblock_enforce_memory_limit(phys_addr_t memory_limit); > -void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size); > +void membl
Re: [PATCH v4] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel
On 4/16/19 8:00 AM, Junichi Nomura wrote: > On 4/15/19 7:25 PM, Borislav Petkov wrote: >> On Mon, Apr 15, 2019 at 11:07:17AM +0200, Borislav Petkov wrote: >>> On Mon, Apr 15, 2019 at 07:01:54AM +, Junichi Nomura wrote: OK. Then I'll go back to v3 and make sure to hang when something is wrong during kexec boot on EFI system. >>> >>> No need - I have it here locally. I'll clean it up and post it for >>> review. >> >> Here it is. Ok, not ok? > > Thank you. Basically ok. > I put some comments below about whether to hang or return. > >> +static acpi_physical_address kexec_get_rsdp_addr(void) >> +{ >> +efi_system_table_64_t *systab; >> +struct efi_setup_data *esd; >> +struct efi_info *ei; >> +char *sig; >> + >> +esd = (struct efi_setup_data *)get_kexec_setup_data_addr(); >> +if (!esd) >> +return 0; >> + >> +if (!esd->tables) { >> +debug_putstr("Wrong kexec SETUP_EFI data.\n"); >> +return 0; >> +} > > I thought we should hang here instead of return so that we > don't run into efi_get_rsdp_addr() in case of kexec. > >> +ei = &boot_params->efi_info; >> +sig = (char *)&ei->efi_loader_signature; >> +if (strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) { >> +debug_putstr("Wrong kexec EFI loader signature.\n"); >> +return 0; >> +} > > Same here. One more question just for clarification. I see kexec is only supported on 64bit kernel. But are we sure we don't need to support kexec on EFI32 + 64bit kernel? I don't have such an environment and as far as I tried with OVMF i386 and KVM guest, that combination doesn't work reliably even with v5.0. So I suppose people don't care. >> +/* Get systab from boot params. */ >> +systab = (efi_system_table_64_t *) (ei->efi_systab | >> ((__u64)ei->efi_systab_hi << 32)); >> +if (!systab) >> +error("EFI system table not found in kexec boot_params."); >> + >> +return __efi_get_rsdp_addr((unsigned long)esd->tables, >> systab->nr_tables, true); > > Same here when __efi_get_rsdp_addr() returns 0. > > I'm fine with either way, though. -- Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel
On 4/15/19 7:25 PM, Borislav Petkov wrote: > On Mon, Apr 15, 2019 at 11:07:17AM +0200, Borislav Petkov wrote: >> On Mon, Apr 15, 2019 at 07:01:54AM +, Junichi Nomura wrote: >>> OK. Then I'll go back to v3 and make sure to hang when >>> something is wrong during kexec boot on EFI system. >> >> No need - I have it here locally. I'll clean it up and post it for >> review. > > Here it is. Ok, not ok? Thank you. Basically ok. I put some comments below about whether to hang or return. > +static acpi_physical_address kexec_get_rsdp_addr(void) > +{ > + efi_system_table_64_t *systab; > + struct efi_setup_data *esd; > + struct efi_info *ei; > + char *sig; > + > + esd = (struct efi_setup_data *)get_kexec_setup_data_addr(); > + if (!esd) > + return 0; > + > + if (!esd->tables) { > + debug_putstr("Wrong kexec SETUP_EFI data.\n"); > + return 0; > + } I thought we should hang here instead of return so that we don't run into efi_get_rsdp_addr() in case of kexec. > + ei = &boot_params->efi_info; > + sig = (char *)&ei->efi_loader_signature; > + if (strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) { > + debug_putstr("Wrong kexec EFI loader signature.\n"); > + return 0; > + } Same here. > + /* Get systab from boot params. */ > + systab = (efi_system_table_64_t *) (ei->efi_systab | > ((__u64)ei->efi_systab_hi << 32)); > + if (!systab) > + error("EFI system table not found in kexec boot_params."); > + > + return __efi_get_rsdp_addr((unsigned long)esd->tables, > systab->nr_tables, true); Same here when __efi_get_rsdp_addr() returns 0. I'm fine with either way, though. -- Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 3/5] memblock: add memblock_cap_memory_ranges for multiple ranges
Hi, On Mon, Apr 15, 2019 at 06:57:23PM +0800, Chen Zhou wrote: > The memblock_cap_memory_range() removes all the memory except the > range passed to it. Extend this function to receive memblock_type > with the regions that should be kept. > > Enable this function in arm64 for reservation of multiple regions > for the crash kernel. > > Signed-off-by: Chen Zhou > Signed-off-by: Mike Rapoport I didn't work on this version, please drop the signed-off. > --- > include/linux/memblock.h | 1 + > mm/memblock.c| 45 + > 2 files changed, 46 insertions(+) > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index 47e3c06..180877c 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -446,6 +446,7 @@ phys_addr_t memblock_start_of_DRAM(void); > phys_addr_t memblock_end_of_DRAM(void); > void memblock_enforce_memory_limit(phys_addr_t memory_limit); > void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size); > +void memblock_cap_memory_ranges(struct memblock_type *regions_to_keep); > void memblock_mem_limit_remove_map(phys_addr_t limit); > bool memblock_is_memory(phys_addr_t addr); > bool memblock_is_map_memory(phys_addr_t addr); > diff --git a/mm/memblock.c b/mm/memblock.c > index f315eca..9661807 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -1697,6 +1697,51 @@ void __init memblock_cap_memory_range(phys_addr_t > base, phys_addr_t size) > base + size, PHYS_ADDR_MAX); > } > > +void __init memblock_cap_memory_ranges(struct memblock_type *regions_to_keep) > +{ > + int start_rgn[INIT_MEMBLOCK_REGIONS], end_rgn[INIT_MEMBLOCK_REGIONS]; > + int i, j, ret, nr = 0; > + struct memblock_region *regs = regions_to_keep->regions; > + > + for (i = 0; i < regions_to_keep->cnt; i++) { > + ret = memblock_isolate_range(&memblock.memory, regs[i].base, > + regs[i].size, &start_rgn[i], &end_rgn[i]); > + if (ret) > + break; > + nr++; > + } > + if (!nr) > + return; > + > + /* remove all the MAP regions */ > + for (i = memblock.memory.cnt - 1; i >= end_rgn[nr - 1]; i--) > + if (!memblock_is_nomap(&memblock.memory.regions[i])) > + memblock_remove_region(&memblock.memory, i); > + > + for (i = nr - 1; i > 0; i--) > + for (j = start_rgn[i] - 1; j >= end_rgn[i - 1]; j--) > + if (!memblock_is_nomap(&memblock.memory.regions[j])) > + memblock_remove_region(&memblock.memory, j); > + > + for (i = start_rgn[0] - 1; i >= 0; i--) > + if (!memblock_is_nomap(&memblock.memory.regions[i])) > + memblock_remove_region(&memblock.memory, i); > + > + /* truncate the reserved regions */ > + memblock_remove_range(&memblock.reserved, 0, regs[0].base); > + > + for (i = nr - 1; i > 0; i--) { > + phys_addr_t remove_base = regs[i - 1].base + regs[i - 1].size; > + phys_addr_t remove_size = regs[i].base - remove_base; > + > + memblock_remove_range(&memblock.reserved, remove_base, > + remove_size); > + } > + > + memblock_remove_range(&memblock.reserved, > + regs[nr - 1].base + regs[nr - 1].size, PHYS_ADDR_MAX); > +} > + I've double-checked and I see no problem with using for_each_mem_range_rev() iterators for removing some ranges. And with them this functions becomes much clearer and more efficient. Can you please check if the below patch works for you? >From e25e6c9cd94a01abac124deacc66e5d258fdbf7c Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Wed, 10 Apr 2019 16:02:32 +0300 Subject: [PATCH] memblock: extend memblock_cap_memory_range to multiple ranges The memblock_cap_memory_range() removes all the memory except the range passed to it. Extend this function to receive an array of memblock_regions that should be kept. This allows switching to simple iteration over memblock arrays with 'for_each_mem_range_rev' to remove the unneeded memory. Enable use of this function in arm64 for reservation of multiple regions for the crash kernel. Signed-off-by: Mike Rapoport --- arch/arm64/mm/init.c | 34 -- include/linux/memblock.h | 2 +- mm/memblock.c| 44 3 files changed, 45 insertions(+), 35 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 6bc1350..8665d29 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -64,6 +64,10 @@ EXPORT_SYMBOL(memstart_addr); phys_addr_t arm64_dma_phys_limit __ro_after_init; #ifdef CONFIG_KEXEC_CORE + +/* at most two crash kernel regions, low_region and high_region */ +#define CRASH_MAX_USABLE_RANGES2 + /* * reserve_crashkernel() - reserves memory for crash kernel * @@ -280,9 +284,9
Re: [PATCH 1/2 RESEND v10] x86/mm, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED'
On Mon, Apr 15, 2019 at 08:22:22PM +0800, lijiang wrote: > They are different problems. Aha, so we're getting closer. You should've lead with that! > The first problem is that passes the e820 reserved ranges to the second > kernel, Passes or *doesn't* pass? Because from all the staring, it wants to pass the reserved ranges. > for this case, it is good enough to use the IORES_DESC_RESERVED, which > can ensure that exactly matches the reserved resource ranges when > walking through iomem resources. Ok. > The second problem is about the SEV case. Now, the IORES_DESC_RESERVED has > been > created for the reserved areas, therefore the check needs to be expanded so > that > these areas are not mapped encrypted when using ioremap(). > > +static int __ioremap_check_desc_none_and_reserved(struct resource *res) That name is crap. If you need to add another desc type, it becomes wrong again. And that whole code around flags->desc_other is just silly: Make that machinery around it something like this: struct ioremap_desc { u64 flags; }; instead of "struct ioremap_mem_flags" and that struct ioremap_desc is an ioremap descriptor which will carry all kinds of settings. system_ram can then be a simple flag too. __ioremap_caller() will hand it down to __ioremap_check_mem() etc and there it will set flags like IOREMAP_DESC_MAP_ENCRYPTED or IOREMAP_DESC_MAP_DECRYPTED and this way you'll have it explicit and clear in __ioremap_caller(): if ((sev_active() && (io_desc.flags & IOREMAP_DESC_MAP_ENCRYPTED)) || encrypted) prot = pgprot_encrypted(prot); But that would need a pre-patch which does that conversion. > Maybe i should split it into two patches. The change of > __ioremap_check_desc_none_and_reserved() should be a separate patch. > Any idea? See above and yes, definitely separate patches. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2 RESEND v10] x86/mm, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED'
在 2019年04月02日 20:43, Borislav Petkov 写道: > On Tue, Apr 02, 2019 at 08:02:04PM +0800, lijiang wrote: >> These regions(E820_TYPE_{RESERVED_KERN,RAM,UNUSABLE}) are still marked as >> IORES_DESC_NONE and should not be mapped encrypted when using ioremap(). > > Seems to me like we're going in circles. You said here: > > https://lkml.kernel.org/r/9eb61523-7a08-24c4-ac15-050537bd9...@redhat.com > > that the kernel doesn't pass the e820 reserved ranges to the second > kernel. > > I suggested to use a special IORES descriptor for them - > IORES_DES_RESERVED. > > Now you say that that is not enough and some of those you want passed, > are still marked as IORES_DESC_NONE. > Sorry for the delay. They are different problems. The first problem is that passes the e820 reserved ranges to the second kernel, for this case, it is good enough to use the IORES_DESC_RESERVED, which can ensure that exactly matches the reserved resource ranges when walking through iomem resources. The second problem is about the SEV case. Now, the IORES_DESC_RESERVED has been created for the reserved areas, therefore the check needs to be expanded so that these areas are not mapped encrypted when using ioremap(). +static int __ioremap_check_desc_none_and_reserved(struct resource *res) { - return (res->desc != IORES_DESC_NONE); + return ((res->desc != IORES_DESC_NONE) && + (res->desc != IORES_DESC_RESERVED)); } Maybe i should split it into two patches. The change of __ioremap_check_desc_none_and_reserved() should be a separate patch. Any idea? Thanks. Lianbo > Sounds to me like you need try again. > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v4 4/5] arm64: kdump: support more than one crash kernel regions
After commit (arm64: kdump: support reserving crashkernel above 4G), there may be two crash kernel regions, one is below 4G, the other is above 4G. Use memblock_cap_memory_ranges() to support multiple crash kernel regions. Crash dump kernel reads more than one crash kernel regions via a dtb property under node /chosen, linux,usable-memory-range = . Besides, replace memblock_cap_memory_range() with memblock_cap_memory_ranges(). Signed-off-by: Chen Zhou Signed-off-by: Mike Rapoport --- arch/arm64/mm/init.c | 34 -- include/linux/memblock.h | 1 - mm/memblock.c| 41 - 3 files changed, 36 insertions(+), 40 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index f5dde73..921953d 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -52,6 +52,9 @@ #include #include +/* at most two crash kernel regions, low_region and high_region */ +#define CRASH_MAX_USABLE_RANGES2 + /* * We need to be able to catch inadvertent references to memstart_addr * that occur (potentially in generic code) before arm64_memblock_init() @@ -295,9 +298,9 @@ early_param("mem", early_mem); static int __init early_init_dt_scan_usablemem(unsigned long node, const char *uname, int depth, void *data) { - struct memblock_region *usablemem = data; - const __be32 *reg; - int len; + struct memblock_type *usablemem = data; + const __be32 *reg, *endp; + int len, nr = 0; if (depth != 1 || strcmp(uname, "chosen") != 0) return 0; @@ -306,22 +309,33 @@ static int __init early_init_dt_scan_usablemem(unsigned long node, if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells))) return 1; - usablemem->base = dt_mem_next_cell(dt_root_addr_cells, ®); - usablemem->size = dt_mem_next_cell(dt_root_size_cells, ®); + endp = reg + (len / sizeof(__be32)); + while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) { + unsigned long base = dt_mem_next_cell(dt_root_addr_cells, ®); + unsigned long size = dt_mem_next_cell(dt_root_size_cells, ®); + + if (memblock_add_range(usablemem, base, size, NUMA_NO_NODE, + MEMBLOCK_NONE)) + return 0; + if (++nr >= CRASH_MAX_USABLE_RANGES) + break; + } return 1; } static void __init fdt_enforce_memory_region(void) { - struct memblock_region reg = { - .size = 0, + struct memblock_region usable_regions[CRASH_MAX_USABLE_RANGES]; + struct memblock_type usablemem = { + .max = CRASH_MAX_USABLE_RANGES, + .regions = usable_regions, }; - of_scan_flat_dt(early_init_dt_scan_usablemem, ®); + of_scan_flat_dt(early_init_dt_scan_usablemem, &usablemem); - if (reg.size) - memblock_cap_memory_range(reg.base, reg.size); + if (usablemem.cnt) + memblock_cap_memory_ranges(&usablemem); } void __init arm64_memblock_init(void) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 180877c..f04dfc1 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -445,7 +445,6 @@ phys_addr_t memblock_mem_size(unsigned long limit_pfn); phys_addr_t memblock_start_of_DRAM(void); phys_addr_t memblock_end_of_DRAM(void); void memblock_enforce_memory_limit(phys_addr_t memory_limit); -void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size); void memblock_cap_memory_ranges(struct memblock_type *regions_to_keep); void memblock_mem_limit_remove_map(phys_addr_t limit); bool memblock_is_memory(phys_addr_t addr); diff --git a/mm/memblock.c b/mm/memblock.c index 9661807..9b5cef4 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1669,34 +1669,6 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit) PHYS_ADDR_MAX); } -void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size) -{ - int start_rgn, end_rgn; - int i, ret; - - if (!size) - return; - - ret = memblock_isolate_range(&memblock.memory, base, size, - &start_rgn, &end_rgn); - if (ret) - return; - - /* remove all the MAP regions */ - for (i = memblock.memory.cnt - 1; i >= end_rgn; i--) - if (!memblock_is_nomap(&memblock.memory.regions[i])) - memblock_remove_region(&memblock.memory, i); - - for (i = start_rgn - 1; i >= 0; i--) - if (!memblock_is_nomap(&memblock.memory.regions[i])) - memblock_remove_region(&memblock.memory, i); - - /* truncate the reserved regions */ - memblock_remove_range(&memblock.reserved, 0, base); - memblock_remove_range(&memblock.reser
[PATCH v4 5/5] kdump: update Documentation about crashkernel on arm64
Now we support crashkernel=X,[high,low] on arm64, update the Documentation. Signed-off-by: Chen Zhou --- Documentation/admin-guide/kernel-parameters.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 308af3b..a055983 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -715,14 +715,14 @@ Documentation/kdump/kdump.txt for an example. crashkernel=size[KMG],high - [KNL, x86_64] range could be above 4G. Allow kernel + [KNL, x86_64, arm64] range could be above 4G. Allow kernel to allocate physical memory region from top, so could be above 4G if system have more than 4G ram installed. Otherwise memory region will be allocated below 4G, if available. It will be ignored if crashkernel=X is specified. crashkernel=size[KMG],low - [KNL, x86_64] range under 4G. When crashkernel=X,high + [KNL, x86_64, arm64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region above 4G, that cause second kernel crash on system that require some amount of low memory, e.g. swiotlb -- 2.7.4 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v4 2/5] arm64: kdump: support reserving crashkernel above 4G
When crashkernel is reserved above 4G in memory, kernel should reserve some amount of low memory for swiotlb and some DMA buffers. Kernel would try to allocate at least 256M below 4G automatically as x86_64 if crashkernel is above 4G. Meanwhile, support crashkernel=X,[high,low] in arm64. Signed-off-by: Chen Zhou --- arch/arm64/include/asm/kexec.h | 3 +++ arch/arm64/kernel/setup.c | 3 +++ arch/arm64/mm/init.c | 25 - 3 files changed, 26 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 67e4cb7..32949bf 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -28,6 +28,9 @@ #define KEXEC_ARCH KEXEC_ARCH_AARCH64 +/* 2M alignment for crash kernel regions */ +#define CRASH_ALIGNSZ_2M + #ifndef __ASSEMBLY__ /** diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 413d566..82cd9a0 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -243,6 +243,9 @@ static void __init request_standard_resources(void) request_resource(res, &kernel_data); #ifdef CONFIG_KEXEC_CORE /* Userspace will find "Crash kernel" region in /proc/iomem. */ + if (crashk_low_res.end && crashk_low_res.start >= res->start && + crashk_low_res.end <= res->end) + request_resource(res, &crashk_low_res); if (crashk_res.end && crashk_res.start >= res->start && crashk_res.end <= res->end) request_resource(res, &crashk_res); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 972bf43..f5dde73 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -74,20 +74,30 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; + bool high = false; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base); /* no crashkernel= or invalid value specified */ - if (ret || !crash_size) - return; + if (ret || !crash_size) { + /* crashkernel=X,high */ + ret = parse_crashkernel_high(boot_command_line, + memblock_phys_mem_size(), + &crash_size, &crash_base); + if (ret || !crash_size) + return; + high = true; + } crash_size = PAGE_ALIGN(crash_size); if (crash_base == 0) { /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT, - crash_size, SZ_2M); + crash_base = memblock_find_in_range(0, + high ? memblock_end_of_DRAM() + : ARCH_LOW_ADDRESS_LIMIT, + crash_size, CRASH_ALIGN); if (crash_base == 0) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", crash_size); @@ -105,13 +115,18 @@ static void __init reserve_crashkernel(void) return; } - if (!IS_ALIGNED(crash_base, SZ_2M)) { + if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) { pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n"); return; } } memblock_reserve(crash_base, crash_size); + if (crash_base >= SZ_4G && reserve_crashkernel_low()) { + memblock_free(crash_base, crash_size); + return; + } + pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n", crash_base, crash_base + crash_size, crash_size >> 20); -- 2.7.4 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v4 3/5] memblock: add memblock_cap_memory_ranges for multiple ranges
The memblock_cap_memory_range() removes all the memory except the range passed to it. Extend this function to receive memblock_type with the regions that should be kept. Enable this function in arm64 for reservation of multiple regions for the crash kernel. Signed-off-by: Chen Zhou Signed-off-by: Mike Rapoport --- include/linux/memblock.h | 1 + mm/memblock.c| 45 + 2 files changed, 46 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 47e3c06..180877c 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -446,6 +446,7 @@ phys_addr_t memblock_start_of_DRAM(void); phys_addr_t memblock_end_of_DRAM(void); void memblock_enforce_memory_limit(phys_addr_t memory_limit); void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size); +void memblock_cap_memory_ranges(struct memblock_type *regions_to_keep); void memblock_mem_limit_remove_map(phys_addr_t limit); bool memblock_is_memory(phys_addr_t addr); bool memblock_is_map_memory(phys_addr_t addr); diff --git a/mm/memblock.c b/mm/memblock.c index f315eca..9661807 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1697,6 +1697,51 @@ void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size) base + size, PHYS_ADDR_MAX); } +void __init memblock_cap_memory_ranges(struct memblock_type *regions_to_keep) +{ + int start_rgn[INIT_MEMBLOCK_REGIONS], end_rgn[INIT_MEMBLOCK_REGIONS]; + int i, j, ret, nr = 0; + struct memblock_region *regs = regions_to_keep->regions; + + for (i = 0; i < regions_to_keep->cnt; i++) { + ret = memblock_isolate_range(&memblock.memory, regs[i].base, + regs[i].size, &start_rgn[i], &end_rgn[i]); + if (ret) + break; + nr++; + } + if (!nr) + return; + + /* remove all the MAP regions */ + for (i = memblock.memory.cnt - 1; i >= end_rgn[nr - 1]; i--) + if (!memblock_is_nomap(&memblock.memory.regions[i])) + memblock_remove_region(&memblock.memory, i); + + for (i = nr - 1; i > 0; i--) + for (j = start_rgn[i] - 1; j >= end_rgn[i - 1]; j--) + if (!memblock_is_nomap(&memblock.memory.regions[j])) + memblock_remove_region(&memblock.memory, j); + + for (i = start_rgn[0] - 1; i >= 0; i--) + if (!memblock_is_nomap(&memblock.memory.regions[i])) + memblock_remove_region(&memblock.memory, i); + + /* truncate the reserved regions */ + memblock_remove_range(&memblock.reserved, 0, regs[0].base); + + for (i = nr - 1; i > 0; i--) { + phys_addr_t remove_base = regs[i - 1].base + regs[i - 1].size; + phys_addr_t remove_size = regs[i].base - remove_base; + + memblock_remove_range(&memblock.reserved, remove_base, + remove_size); + } + + memblock_remove_range(&memblock.reserved, + regs[nr - 1].base + regs[nr - 1].size, PHYS_ADDR_MAX); +} + void __init memblock_mem_limit_remove_map(phys_addr_t limit) { phys_addr_t max_addr; -- 2.7.4 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v4 1/5] x86: kdump: move reserve_crashkernel_low() into kexec_core.c
In preparation for supporting more than one crash kernel regions in arm64 as x86_64 does, move reserve_crashkernel_low() into kexec/kexec_core.c. Signed-off-by: Chen Zhou --- arch/x86/include/asm/kexec.h | 3 ++ arch/x86/kernel/setup.c | 66 +--- include/linux/kexec.h| 5 kernel/kexec_core.c | 56 + 4 files changed, 71 insertions(+), 59 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 003f2da..485a514 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -18,6 +18,9 @@ # define KEXEC_CONTROL_CODE_MAX_SIZE 2048 +/* 16M alignment for crash kernel regions */ +#define CRASH_ALIGN(16 << 20) + #ifndef __ASSEMBLY__ #include diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 3773905..4182035 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -447,9 +447,6 @@ static void __init memblock_x86_reserve_range_setup_data(void) #ifdef CONFIG_KEXEC_CORE -/* 16M alignment for crash kernel regions */ -#define CRASH_ALIGN(16 << 20) - /* * Keep the crash kernel below this limit. On 32 bits earlier kernels * would limit the kernel to the low 512 MiB due to mapping restrictions. @@ -463,59 +460,6 @@ static void __init memblock_x86_reserve_range_setup_data(void) # define CRASH_ADDR_HIGH_MAX MAXMEM #endif -static int __init reserve_crashkernel_low(void) -{ -#ifdef CONFIG_X86_64 - unsigned long long base, low_base = 0, low_size = 0; - unsigned long total_low_mem; - int ret; - - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT)); - - /* crashkernel=Y,low */ - ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base); - if (ret) { - /* -* two parts from lib/swiotlb.c: -* -swiotlb size: user-specified with swiotlb= or default. -* -* -swiotlb overflow buffer: now hardcoded to 32k. We round it -* to 8M for other buffers that may need to stay low too. Also -* make sure we allocate enough extra low memory so that we -* don't run out of DMA buffers for 32-bit devices. -*/ - low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20); - } else { - /* passed with crashkernel=0,low ? */ - if (!low_size) - return 0; - } - - low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN); - if (!low_base) { - pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n", - (unsigned long)(low_size >> 20)); - return -ENOMEM; - } - - ret = memblock_reserve(low_base, low_size); - if (ret) { - pr_err("%s: Error reserving crashkernel low memblock.\n", __func__); - return ret; - } - - pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n", - (unsigned long)(low_size >> 20), - (unsigned long)(low_base >> 20), - (unsigned long)(total_low_mem >> 20)); - - crashk_low_res.start = low_base; - crashk_low_res.end = low_base + low_size - 1; - insert_resource(&iomem_resource, &crashk_low_res); -#endif - return 0; -} - static void __init reserve_crashkernel(void) { unsigned long long crash_size, crash_base, total_mem; @@ -573,9 +517,13 @@ static void __init reserve_crashkernel(void) return; } - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { - memblock_free(crash_base, crash_size); - return; + if (crash_base >= (1ULL << 32)) { + if (reserve_crashkernel_low()) { + memblock_free(crash_base, crash_size); + return; + } + + insert_resource(&iomem_resource, &crashk_low_res); } pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n", diff --git a/include/linux/kexec.h b/include/linux/kexec.h index b9b1bc5..096ad63 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -63,6 +63,10 @@ #define KEXEC_CORE_NOTE_NAME CRASH_CORE_NOTE_NAME +#ifndef CRASH_ALIGN +#define CRASH_ALIGN SZ_128M +#endif + /* * This structure is used to hold the arguments that are used when loading * kernel binaries. @@ -281,6 +285,7 @@ extern void __crash_kexec(struct pt_regs *); extern void crash_kexec(struct pt_regs *); int kexec_should_crash(struct task_struct *); int kexec_crash_loaded(void); +int __init reserve_crashkernel_low(void); void crash_save_cpu(struct pt_regs *regs, int cpu); extern int kimage_crash_copy_vmcoreinfo(struct kimage
[PATCH v4 0/5] support reserving crashkernel above 4G on arm64 kdump
When crashkernel is reserved above 4G in memory, kernel should reserve some amount of low memory for swiotlb and some DMA buffers. So there may be two crash kernel regions, one is below 4G, the other is above 4G. Crash dump kernel reads more than one crash kernel regions via a dtb property under node /chosen, linux,usable-memory-range = . Besides, we need to modify kexec-tools: arm64: support more than one crash kernel regions(see [1]) Changes since [v3] - Add memblock_cap_memory_ranges for multiple ranges. - Split patch "arm64: kdump: support more than one crash kernel regions" as two. One is above "Add memblock_cap_memory_ranges", the other is using memblock_cap_memory_ranges to support multiple crash kernel regions. - Fix some compiling warnings. Changes since [v2] - Split patch "arm64: kdump: support reserving crashkernel above 4G" as two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate patch. Changes since [v1]: - Move common reserve_crashkernel_low() code into kernel/kexec_core.c. - Remove memblock_cap_memory_ranges() i added in v1 and implement that in fdt_enforce_memory_region(). There are at most two crash kernel regions, for two crash kernel regions case, we cap the memory range [min(regs[*].start), max(regs[*].end)] and then remove the memory range in the middle. [1]: http://lists.infradead.org/pipermail/kexec/2019-April/022792.html [v1]: https://lkml.org/lkml/2019/4/8/628 [v2]: https://lkml.org/lkml/2019/4/9/86 [V3]: https://lkml.org/lkml/2019/4/15/6 Chen Zhou (5): x86: kdump: move reserve_crashkernel_low() into kexec_core.c arm64: kdump: support reserving crashkernel above 4G memblock: add memblock_cap_memory_ranges for multiple ranges arm64: kdump: support more than one crash kernel regions kdump: update Documentation about crashkernel on arm64 Documentation/admin-guide/kernel-parameters.txt | 4 +- arch/arm64/include/asm/kexec.h | 3 ++ arch/arm64/kernel/setup.c | 3 ++ arch/arm64/mm/init.c| 59 -- arch/x86/include/asm/kexec.h| 3 ++ arch/x86/kernel/setup.c | 66 +++-- include/linux/kexec.h | 5 ++ include/linux/memblock.h| 2 +- kernel/kexec_core.c | 56 + mm/memblock.c | 56 +++-- 10 files changed, 166 insertions(+), 91 deletions(-) -- 2.7.4 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel
On Mon, Apr 15, 2019 at 11:07:17AM +0200, Borislav Petkov wrote: > On Mon, Apr 15, 2019 at 07:01:54AM +, Junichi Nomura wrote: > > OK. Then I'll go back to v3 and make sure to hang when > > something is wrong during kexec boot on EFI system. > > No need - I have it here locally. I'll clean it up and post it for > review. Here it is. Ok, not ok? --- diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c index 0ef4ad55b29b..089639a8a384 100644 --- a/arch/x86/boot/compressed/acpi.c +++ b/arch/x86/boot/compressed/acpi.c @@ -44,17 +44,109 @@ static acpi_physical_address get_acpi_rsdp(void) return addr; } -/* Search EFI system tables for RSDP. */ -static acpi_physical_address efi_get_rsdp_addr(void) +/* + * Search EFI system tables for RSDP. If both ACPI_20_TABLE_GUID and + * ACPI_TABLE_GUID are found, take the former, which has more features. + */ +static acpi_physical_address +__efi_get_rsdp_addr(unsigned long config_tables, unsigned int nr_tables, + bool efi_64) { acpi_physical_address rsdp_addr = 0; #ifdef CONFIG_EFI - unsigned long systab, systab_tables, config_tables; + int i; + + /* Get EFI tables from systab. */ + for (i = 0; i < nr_tables; i++) { + acpi_physical_address table; + efi_guid_t guid; + + if (efi_64) { + efi_config_table_64_t *tbl = (efi_config_table_64_t *) config_tables + i; + + guid = tbl->guid; + table = tbl->table; + + if (!IS_ENABLED(CONFIG_X86_64) && table >> 32) { + debug_putstr("Error getting RSDP address: EFI config table located above 4GB.\n"); + return 0; + } + } else { + efi_config_table_32_t *tbl = (efi_config_table_32_t *) config_tables + i; + + guid = tbl->guid; + table = tbl->table; + } + + if (!(efi_guidcmp(guid, ACPI_TABLE_GUID))) + rsdp_addr = table; + else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID))) + return table; + } +#endif + return rsdp_addr; +} + +/* EFI/kexec support is 64-bit only. */ +#ifdef CONFIG_X86_64 +static struct efi_setup_data * get_kexec_setup_data_addr(void) +{ + struct setup_data *data; + u64 pa_data; + + pa_data = boot_params->hdr.setup_data; + while (pa_data) { + data = (struct setup_data *)pa_data; + if (data->type == SETUP_EFI) + return (struct efi_setup_data *)(pa_data + sizeof(struct setup_data)); + + pa_data = data->next; + } + return NULL; +} + +static acpi_physical_address kexec_get_rsdp_addr(void) +{ + efi_system_table_64_t *systab; + struct efi_setup_data *esd; + struct efi_info *ei; + char *sig; + + esd = (struct efi_setup_data *)get_kexec_setup_data_addr(); + if (!esd) + return 0; + + if (!esd->tables) { + debug_putstr("Wrong kexec SETUP_EFI data.\n"); + return 0; + } + + ei = &boot_params->efi_info; + sig = (char *)&ei->efi_loader_signature; + if (strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) { + debug_putstr("Wrong kexec EFI loader signature.\n"); + return 0; + } + + /* Get systab from boot params. */ + systab = (efi_system_table_64_t *) (ei->efi_systab | ((__u64)ei->efi_systab_hi << 32)); + if (!systab) + error("EFI system table not found in kexec boot_params."); + + return __efi_get_rsdp_addr((unsigned long)esd->tables, systab->nr_tables, true); +} +#else +static acpi_physical_address kexec_get_rsdp_addr(void) { return 0; } +#endif /* CONFIG_X86_64 */ + +static acpi_physical_address efi_get_rsdp_addr(void) +{ +#ifdef CONFIG_EFI + unsigned long systab, config_tables; unsigned int nr_tables; struct efi_info *ei; bool efi_64; - int size, i; char *sig; ei = &boot_params->efi_info; @@ -88,49 +180,20 @@ static acpi_physical_address efi_get_rsdp_addr(void) config_tables = stbl->tables; nr_tables = stbl->nr_tables; - size= sizeof(efi_config_table_64_t); } else { efi_system_table_32_t *stbl = (efi_system_table_32_t *)systab; config_tables = stbl->tables; nr_tables = stbl->nr_tables; - size= sizeof(efi_config_table_32_t); } if (!config_tables) error("EFI config tables not found."); - /* Get EFI tables from systab. */ - for (i = 0; i < nr_tables; i++) { - acpi_physical_address table; - efi_guid_t guid; - -
Re: [PATCH v4] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel
On Mon, Apr 15, 2019 at 07:01:54AM +, Junichi Nomura wrote: > OK. Then I'll go back to v3 and make sure to hang when > something is wrong during kexec boot on EFI system. No need - I have it here locally. I'll clean it up and post it for review. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel
On 04/12/19 at 08:23am, Baoquan He wrote: > On 04/11/19 at 09:14am, Junichi Nomura wrote: > > On 4/11/19 5:42 PM, Baoquan He wrote: > > > On 04/11/19 at 08:16am, Junichi Nomura wrote: > > >> kexec_get_rsdp_addr() might fail on kexec-booted kernel, e.g. if the > > >> setup_data was invalid. In such a case, falling back to > > >> efi_get_rsdp_addr() > > >> will hit the problem of accessing invalid table pointer again. > > > > > > Seems you are trying to address Dave Young's comment in > > > http://lkml.kernel.org/r/20190404073233.gc5...@dhcp-128-65.nay.redhat.com > > > > Right. His "In case kexec_get_rsdp_addr failed.." comment. > > > > > We may need discuss and make clear if those are doable. E.g the first > > > comment, if not hang by below line of code, returning 0 for what? Can > > > kexec still be saved, or just reset to firmware? > > > > > > error("EFI system table not found in kexec boot_params.") > > > > If we return 0 and also don't hang in the rest of get_rsdp_addr(), > > it just work as the same way as v5.0 and earlier kernel do. > > > > Failure cases in kexec_get_rsdp_addr() are followings: > > 1. efi_setup_data is invalid > > 2. loader signature is invalid > > 3. EFI systab is not found in boot_params > > 4. RSDP is not found by parsing tables pointed to by efi_setup_data > > > > I think all of them are critical for EFI boot, so one option could be > > we never return failure in kexec_get_rsdp_addr() and just hang. > > But hanging in this very early stage of boot may make the problem > > harder to investigate once happens. Even earlyprintk is not working yet. > > So the other option is returning 0 to defer the crash for later stage. > > OK, I got the point, thanks. So it is deferred to the late stage, KASLR > may not avoid those memory region which is marked as hotpluggable in > SRAT. Kernel can boot up, but doesn't function well on hotplug stuff. > In this case, people don't know why it happened. We are still blind. > > Seems early console in efi is the problem, but not kexec or hotplug. I > am fine to hang, or make it continue booting for now. > > Hi Dave, > > Is it possible to fix the efi early console issue? I mean the > feasibility, I believe it won't be easy. Ask this because not only this > issue encountered, any other issue could be triggered during boot > decompressing stage. If efi has this problem, we can't debug them > either. For normal boot, it maybe doable to use some boot services eg. some graphic protocols efi firmware provided. But for kexec, it is different because it become virtual mode, boot services are not available, and kernel takes over the mode setting etc. the early framebuffer maybe usable, maybe not, it is not reliable. Thanks Dave ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel
On 4/12/19 10:35 PM, Borislav Petkov wrote: > On Fri, Apr 12, 2019 at 10:49:56AM +0200, Borislav Petkov wrote: >> Now I need to go figure out whether there's a reliable way to know in >> the kexec kernel that it *is* a kexec kernel. > > Actually, thinking about this more, we don't need to know whether the > kernel was kexeced or not. Why? > > Because if it is kexec'ed, kexec(1) passes the required info in > setup_data. Now, if for whatever reason the kexec'ed kernel fails to > parse that EFI info and get the systab to figure out the RDSP, then it > doesn't have any other choice but fail booting. > > Because there's no way it can figure out where the EFI runtime has been > mapped and recover by finding the RDSP from there. > > So I think we're perfectly fine with the old approach: > > if (!pa) > pa = kexec_get_rsdp_addr(); > > if (!pa) > pa = efi_get_rsdp_addr(); OK. Then I'll go back to v3 and make sure to hang when something is wrong during kexec boot on EFI system. -- Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec