Re: [PATCH] arm64/defconfig: Enable CONFIG_KEXEC_FILE
On Tue, Apr 28, 2020 at 3:37 PM Catalin Marinas wrote: > > On Tue, Apr 28, 2020 at 01:55:58PM +0530, Bhupesh Sharma wrote: > > On Wed, Apr 8, 2020 at 4:17 PM Mark Rutland wrote: > > > On Tue, Apr 07, 2020 at 04:01:40AM +0530, Bhupesh Sharma wrote: > > > > arch/arm64/configs/defconfig | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > > > > index 24e534d85045..fa122f4341a2 100644 > > > > --- a/arch/arm64/configs/defconfig > > > > +++ b/arch/arm64/configs/defconfig > > > > @@ -66,6 +66,7 @@ CONFIG_SCHED_SMT=y > > > > CONFIG_NUMA=y > > > > CONFIG_SECCOMP=y > > > > CONFIG_KEXEC=y > > > > +CONFIG_KEXEC_FILE=y > > > > CONFIG_CRASH_DUMP=y > > > > CONFIG_XEN=y > > > > CONFIG_COMPAT=y > > > > -- > > > > 2.7.4 > > > > Thanks a lot Mark. > > > > Hi Catalin, Will, > > > > Can you please help pick this patch in the arm tree. We have an > > increasing number of user-cases from distro users > > who want to use kexec_file_load() as the default interface for > > kexec/kdump on arm64. > > We could pick it up if it doesn't conflict with the arm-soc tree. They > tend to pick most of the defconfig changes these days (and could as well > pick this one). Thanks Catalin. (+Cc Arnd) Hi Arnd, Can you please help pick this change via the arm-soc tree? Thanks, Bhupesh ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: Re: [RESEND PATCH v5 2/5] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
Hi Bhupesh, On 2020-02-23 10:25 p.m., Bhupesh Sharma wrote: Hi Amit, On Fri, Feb 21, 2020 at 2:36 PM Amit Kachhap wrote: Hi Bhupesh, On 1/13/20 5:44 PM, Bhupesh Sharma wrote: Hi James, On 01/11/2020 12:30 AM, Dave Anderson wrote: - Original Message - Hi Bhupesh, On 25/12/2019 19:01, Bhupesh Sharma wrote: On 12/12/2019 04:02 PM, James Morse wrote: On 29/11/2019 19:59, Bhupesh Sharma wrote: vabits_actual variable on arm64 indicates the actual VA space size, and allows a single binary to support both 48-bit and 52-bit VA spaces. If the ARMv8.2-LVA optional feature is present, and we are running with a 64KB page size; then it is possible to use 52-bits of address space for both userspace and kernel addresses. However, any kernel binary that supports 52-bit must also be able to fall back to 48-bit at early boot time if the hardware feature is not present. Since TCR_EL1.T1SZ indicates the size offset of the memory region addressed by TTBR1_EL1 (and hence can be used for determining the vabits_actual value) it makes more sense to export the same in vmcoreinfo rather than vabits_actual variable, as the name of the variable can change in future kernel versions, but the architectural constructs like TCR_EL1.T1SZ can be used better to indicate intended specific fields to user-space. User-space utilities like makedumpfile and crash-utility, need to read/write this value from/to vmcoreinfo (write?) Yes, also write so that the vmcoreinfo from an (crashing) arm64 system can be used for analysis of the root-cause of panic/crash on say an x86_64 host using utilities like crash-utility/gdb. I read this as as "User-space [...] needs to write to vmcoreinfo". That's correct. But for writing to vmcore dump in the kdump kernel, we need to read the symbols from the vmcoreinfo in the primary kernel. for determining if a virtual address lies in the linear map range. I think this is a fragile example. The debugger shouldn't need to know this. Well that the current user-space utility design, so I am not sure we can tweak that too much. The user-space computation for determining whether an address lies in the linear map range is the same as we have in kernel-space: #define __is_lm_address(addr)(!(((u64)addr) & BIT(vabits_actual - 1))) This was changed with 14c127c957c1 ("arm64: mm: Flip kernel VA space"). If user-space tools rely on 'knowing' the kernel memory layout, they must have to constantly be fixed and updated. This is a poor argument for adding this to something that ends up as ABI. See above. The user-space has to rely on some ABI/guaranteed hardware-symbols which can be used for 'determining' the kernel memory layout. I disagree. Everything and anything in the kernel will change. The ABI rules apply to stuff exposed via syscalls and kernel filesystems. It does not apply to kernel internals, like the memory layout we used yesterday. 14c127c957c1 is a case in point. A debugger trying to rely on this sort of thing would have to play catchup whenever it changes. Exactly. That's the whole point. The crash utility and makedumpfile are not in the same league as other user-space tools. They have always had to "play catchup" precisely because they depend upon kernel internals, which constantly change. I agree with you and DaveA here. Software user-space debuggers are dependent on kernel internals (which can change from time-to-time) and will have to play catch-up (which has been the case since the very start). Unfortunately we don't have any clear ABI for software debugging tools - may be something to look for in future. A case in point is gdb/kgdb, which still needs to run with KASLR turned-off (nokaslr) for debugging, as it confuses gdb which resolve kernel symbol address from symbol table of vmlinux. But we can work-around the same in makedumpfile/crash by reading the 'kaslr_offset' value. And I have several users telling me now they cannot use gdb on KASLR enabled kernel to debug panics, but can makedumpfile + crash combination to achieve the same. So, we should be looking to fix these utilities which are broken since the 52-bit changes for arm64. Accordingly, I will try to send the v6 soon while incorporating the comments posted on the v5. Any update on the next v6 version. Since this patch series is fixing the current broken kdump so need this series to add some more fields in vmcoreinfo for Pointer Authentication work. Sorry for the delay. I was caught up in some other urgent arm64 user-space issues. I am preparing the v6 now and hopefully will be able to post it out for review later today. Did v6 get sent out? Thanks, Bhupesh Regards, Scott ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[RFC PATCH 1/1] printk: add support for lockless ringbuffer
Linux is moving to a new lockless ringbuffer. The new ringbuffer is structured completely different to the previous iterations. Add support for retrieving the ringbuffer from debug information and/or using vmcoreinfo. The new ringbuffer is detected based on the availability of the "prb" symbol. Signed-off-by: John Ogness --- Makefile | 2 +- dwarf_info.c | 36 +- makedumpfile.c | 101 ++-- makedumpfile.h | 25 +++ printk.c | 177 + 5 files changed, 333 insertions(+), 8 deletions(-) create mode 100644 printk.c diff --git a/Makefile b/Makefile index ef20672..dc4ae3e 100644 --- a/Makefile +++ b/Makefile @@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32 endif SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h -SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c +SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART)) SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH)) diff --git a/dwarf_info.c b/dwarf_info.c index e42a9f5..543588b 100644 --- a/dwarf_info.c +++ b/dwarf_info.c @@ -614,6 +614,7 @@ search_structure(Dwarf_Die *die, int *found) { int tag; const char *name; + Dwarf_Die die_type; /* * If we get to here then we don't have any more @@ -622,9 +623,31 @@ search_structure(Dwarf_Die *die, int *found) do { tag = dwarf_tag(die); name = dwarf_diename(die); - if ((tag != DW_TAG_structure_type) || (!name) - || strcmp(name, dwarf_info.struct_name)) + if ((!name) || strcmp(name, dwarf_info.struct_name)) + continue; + + if (tag == DW_TAG_typedef) { + if (!get_die_type(die, _type)) { + ERRMSG("Can't get CU die of DW_AT_type.\n"); + break; + } + + /* Resolve typedefs of typedefs. */ + while ((tag = dwarf_tag(_type)) == DW_TAG_typedef) { + if (!get_die_type(_type, _type)) { + ERRMSG("Can't get CU die of DW_AT_type.\n"); + return; + } + } + + if (tag != DW_TAG_structure_type) + continue; + die = _type; + + } else if (tag != DW_TAG_structure_type) { continue; + } + /* * Skip if DW_AT_byte_size is not included. */ @@ -740,6 +763,15 @@ search_typedef(Dwarf_Die *die, int *found) ERRMSG("Can't get CU die of DW_AT_type.\n"); break; } + + /* Resolve typedefs of typedefs. */ + while ((tag = dwarf_tag(_type)) == DW_TAG_typedef) { + if (!get_die_type(_type, _type)) { + ERRMSG("Can't get CU die of DW_AT_type.\n"); + return; + } + } + dwarf_info.struct_size = dwarf_bytesize(_type); if (dwarf_info.struct_size <= 0) continue; diff --git a/makedumpfile.c b/makedumpfile.c index f5860a1..6192ee7 100644 --- a/makedumpfile.c +++ b/makedumpfile.c @@ -1555,6 +1555,7 @@ get_symbol_info(void) SYMBOL_INIT(node_data, "node_data"); SYMBOL_INIT(pgdat_list, "pgdat_list"); SYMBOL_INIT(contig_page_data, "contig_page_data"); + SYMBOL_INIT(prb, "prb"); SYMBOL_INIT(log_buf, "log_buf"); SYMBOL_INIT(log_buf_len, "log_buf_len"); SYMBOL_INIT(log_end, "log_end"); @@ -1971,16 +1972,46 @@ get_structure_info(void) OFFSET_INIT(elf64_phdr.p_memsz, "elf64_phdr", "p_memsz"); SIZE_INIT(printk_log, "printk_log"); - if (SIZE(printk_log) != NOT_FOUND_STRUCTURE) { + SIZE_INIT(printk_ringbuffer, "printk_ringbuffer"); + if ((SIZE(printk_ringbuffer) != NOT_FOUND_STRUCTURE)) { + info->flag_use_printk_ringbuffer = TRUE; + info->flag_use_printk_log = FALSE; + + OFFSET_INIT(printk_log.desc_ring, "printk_ringbuffer", "desc_ring"); + OFFSET_INIT(printk_log.text_data_ring, "printk_ringbuffer", "text_data_ring"); + + OFFSET_INIT(printk_log.count_bits, "prb_desc_ring", "count_bits"); + OFFSET_INIT(printk_log.descs, "prb_desc_ring",
[RFC PATCH 0/1] support lockless printk ringbuffer
Hi Kazu, Here is a patch adding full support for the new lockless printk ringbuffer as it is currently being proposed. Note that the latest version has not yet been submitted to LKML. I was waiting until I finished tests with crash(8) and makedumpfile(8). The new ringbuffer will export all the necessary symbols, sizes, and offsets in VMCOREINFO. Note that I created a separate printk.c for the iteration logic. Also note that I modified dwarf_info.c to support resolving typedefs of typedefs. This was necessary in order to support atomic_long_t and its "counter" member. I don't expect you to take the patch as-is, but I hope it can provide some positive ground work for moving forward. John Ogness (1): printk: add support for lockless ringbuffer Makefile | 2 +- dwarf_info.c | 36 +- makedumpfile.c | 101 ++-- makedumpfile.h | 25 +++ printk.c | 177 + 5 files changed, 333 insertions(+), 8 deletions(-) create mode 100644 printk.c -- 2.20.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v9 15/18] arm64: kexec: kexec EL2 vectors
On 2020-03-26 03:24, Pavel Tatashin wrote: If we have a EL2 mode without VHE, the EL2 vectors are needed in order to switch to EL2 and jump to new world with hyperivsor privileges. Signed-off-by: Pavel Tatashin --- arch/arm64/include/asm/kexec.h | 5 + arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kernel/machine_kexec.c | 5 + arch/arm64/kernel/relocate_kernel.S | 35 + 4 files changed, 46 insertions(+) diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index d944c2e289b2..0f758fd51518 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -95,6 +95,7 @@ static inline void crash_post_resume(void) {} extern const unsigned long kexec_relocate_code_size; extern const unsigned char kexec_relocate_code_start[]; extern const unsigned long kexec_kern_reloc_offset; +extern const unsigned long kexec_el2_vectors_offset; #endif /* @@ -104,6 +105,9 @@ extern const unsigned long kexec_kern_reloc_offset; * kernel, or purgatory entry address). * kern_arg0 first argument to kernel is its dtb address. The other * arguments are currently unused, and must be set to 0 + * el2_vector If present means that relocation routine will go to EL1 + * from EL2 to do the copy, and then back to EL2 to do the jump + * to new world. */ struct kern_reloc_arg { phys_addr_t head; @@ -112,6 +116,7 @@ struct kern_reloc_arg { phys_addr_t kern_arg1; phys_addr_t kern_arg2; phys_addr_t kern_arg3; + phys_addr_t el2_vector; }; #define ARCH_HAS_KIMAGE_ARCH diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 448230684749..ff974b648347 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -136,6 +136,7 @@ int main(void) DEFINE(KEXEC_KRELOC_KERN_ARG1, offsetof(struct kern_reloc_arg, kern_arg1)); DEFINE(KEXEC_KRELOC_KERN_ARG2, offsetof(struct kern_reloc_arg, kern_arg2)); DEFINE(KEXEC_KRELOC_KERN_ARG3, offsetof(struct kern_reloc_arg, kern_arg3)); + DEFINE(KEXEC_KRELOC_EL2_VECTOR, offsetof(struct kern_reloc_arg, el2_vector)); #endif return 0; } diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index ab571fca9bd1..bd398def7627 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -84,6 +84,11 @@ int machine_kexec_post_load(struct kimage *kimage) kern_reloc_arg->head = kimage->head; kern_reloc_arg->entry_addr = kimage->start; kern_reloc_arg->kern_arg0 = kimage->arch.dtb_mem; + /* Setup vector table only when EL2 is available, but no VHE */ + if (is_hyp_mode_available() && !is_kernel_in_hyp_mode()) { + kern_reloc_arg->el2_vector = __pa(reloc_code) + + kexec_el2_vectors_offset; + } kexec_image_info(kimage); return 0; diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S index aa9f2b2cd77c..6fd2fc0ef373 100644 --- a/arch/arm64/kernel/relocate_kernel.S +++ b/arch/arm64/kernel/relocate_kernel.S @@ -89,6 +89,38 @@ ENTRY(arm64_relocate_new_kernel) .ltorg END(arm64_relocate_new_kernel) +.macro el1_sync_64 + br x4 /* Jump to new world from el2 */ + .fill 31, 4, 0 /* Set other 31 instr to zeroes */ +.endm The common idiom to write this is to align the beginning of the macro, and not to bother about what follows: .macro whatever .align 7 br x4 .endm Specially given that 0 is an undefined instruction, and I really hate to see those in the actual text. On the contrary, .align generates NOPs. + +.macro invalid_vector label +\label: + b \label + .fill 31, 4, 0 /* Set other 31 instr to zeroes */ +.endm + +/* el2 vectors - switch el2 here while we restore the memory image. */ + .align 11 +ENTRY(kexec_el2_vectors) Please see commit 617a2f392c92 ("arm64: kvm: Annotate assembly using modern annoations"), and follow the same pattern. + invalid_vector el2_sync_invalid_sp0 /* Synchronous EL2t */ + invalid_vector el2_irq_invalid_sp0 /* IRQ EL2t */ + invalid_vector el2_fiq_invalid_sp0 /* FIQ EL2t */ + invalid_vector el2_error_invalid_sp0/* Error EL2t */ + invalid_vector el2_sync_invalid_spx /* Synchronous EL2h */ + invalid_vector el2_irq_invalid_spx /* IRQ EL2h */ + invalid_vector el2_fiq_invalid_spx /* FIQ EL2h */ + invalid_vector el2_error_invalid_spx/* Error EL2h */ + el1_sync_64 /* Synchronous 64-bit EL1 */ + invalid_vector el1_irq_invalid_64 /* IRQ 64-bit EL1 */ + invalid_vector el1_fiq_invalid_64 /* FIQ 64-bit EL1 */ + invalid_vector el1_error_invalid_64 /* Error 64-bit EL1 */ +
Re: [PATCH v9 10/18] arm64: kexec: cpu_soft_restart change argument types
Hi Pavel, On 26/03/2020 03:24, Pavel Tatashin wrote: > Change argument types from unsigned long to a more descriptive > phys_addr_t. For 'entry', which is a physical addresses, sure... > diff --git a/arch/arm64/kernel/cpu-reset.h b/arch/arm64/kernel/cpu-reset.h > index ed50e9587ad8..38cbd4068019 100644 > --- a/arch/arm64/kernel/cpu-reset.h > +++ b/arch/arm64/kernel/cpu-reset.h > @@ -10,17 +10,17 @@ > > #include > > -void __cpu_soft_restart(unsigned long el2_switch, unsigned long entry, > - unsigned long arg0, unsigned long arg1, unsigned long arg2); > +void __cpu_soft_restart(phys_addr_t el2_switch, phys_addr_t entry, > + phys_addr_t arg0, phys_addr_t arg1, phys_addr_t arg2); This looks weird because its re-using the hyp-stub API, because it might call the hyp-stub from the idmap. entry is passed in, so this isn't tied to kexec. Without tying it to kexec, how do you know arg2 is a physical address? I think it tried to be re-usable because 32bit has more users for this. arg0-2 are unsigned long because the hyp-stub is just moving general purpose registers around. This is to avoid casting? Sure, its only got one caller. This thing evolved because the platform-has-EL2 and kdump-while-KVM-was-running code was bolted on as they were discovered. > -static inline void __noreturn cpu_soft_restart(unsigned long entry, > -unsigned long arg0, > -unsigned long arg1, > -unsigned long arg2) > +static inline void __noreturn cpu_soft_restart(phys_addr_t entry, > +phys_addr_t arg0, > +phys_addr_t arg1, > +phys_addr_t arg2) > { > typeof(__cpu_soft_restart) *restart; > > - unsigned long el2_switch = !is_kernel_in_hyp_mode() && > + phys_addr_t el2_switch = !is_kernel_in_hyp_mode() && > is_hyp_mode_available(); What on earth happened here!? > restart = (void *)__pa_symbol(__cpu_soft_restart); Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v9 09/18] arm64: kexec: call kexec_image_info only once
Hi Pavel, On 26/03/2020 03:24, Pavel Tatashin wrote: > Currently, kexec_image_info() is called during load time, and > right before kernel is being kexec'ed. There is no need to do both. I think the original logic was if debugging, you'd see the load-time value in dmesg, and the kexec-time values when the machine panic()d and you hadn't made a note of the previous set. But I'm not sure why you'd have these debug messages enabled unless you were debugging kexec. > So, call it only once when segments are loaded and the physical > location of page with copy of arm64_relocate_new_kernel is known. Sure, keep the earlier version that is more likely to be seen. Acked-by: James Morse Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v9 07/18] arm64: trans_pgd: hibernate: idmap the single page that holds the copy page routines
Hi Pavel, On 26/03/2020 03:24, Pavel Tatashin wrote: > From: James Morse > > To resume from hibernate, the contents of memory are restored from > the swap image. This may overwrite any page, including the running > kernel and its page tables. > > Hibernate copies the code it uses to do the restore into a single > page that it knows won't be overwritten, and maps it with page tables > built from pages that won't be overwritten. > > Today the address it uses for this mapping is arbitrary, but to allow > kexec to reuse this code, it needs to be idmapped. To idmap the page > we must avoid the kernel helpers that have VA_BITS baked in. > > Convert create_single_mapping() to take a single PA, and idmap it. > The page tables are built in the reverse order to normal using > pfn_pte() to stir in any bits between 52:48. T0SZ is always increased > to cover 48bits, or 52 if the copy code has bits 52:48 in its PA. > > Pasha: The original patch from James > inux-arm-kernel/20200115143322.214247-4-james.mo...@arm.com -EBROKENLINK The convention is to use a 'Link:' tag in the signed-off area. e.g. 5a3577039cbe > Adopted it to trans_pgd, so it can be commonly used by both Kexec > and Hibernate. Some minor clean-ups. Please describe your changes just before your SoB. This means each author sign's off on the stuff above their SoB, and its obvious who made which changes. Search for 'Lucky K Maintainer' in process/submitting-patches.rst for an example. > diff --git a/arch/arm64/include/asm/trans_pgd.h > b/arch/arm64/include/asm/trans_pgd.h > index 97a7ea73b289..4912d3caf0ca 100644 > --- a/arch/arm64/include/asm/trans_pgd.h > +++ b/arch/arm64/include/asm/trans_pgd.h > @@ -32,4 +32,7 @@ int trans_pgd_create_copy(struct trans_pgd_info *info, > pgd_t **trans_pgd, > int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd, > void *page, unsigned long dst_addr, pgprot_t pgprot); This trans_pgd_map_page() used to be create_single_mapping(), which is where the original patch made its changes. You should only need one of these, not both. > +int trans_pgd_idmap_page(struct trans_pgd_info *info, phys_addr_t > *trans_ttbr0, > + unsigned long *t0sz, void *page); > + > #endif /* _ASM_TRANS_TABLE_H */ > diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c > index 37d7d1c60f65..c2517d1af2af 100644 > --- a/arch/arm64/mm/trans_pgd.c > +++ b/arch/arm64/mm/trans_pgd.c > @@ -242,3 +242,52 @@ int trans_pgd_map_page(struct trans_pgd_info *info, > pgd_t *trans_pgd, > > return 0; > } > + > +/* > + * The page we want to idmap may be outside the range covered by VA_BITS that > + * can be built using the kernel's p?d_populate() helpers. As a one off, for > a > + * single page, we build these page tables bottom up and just assume that > will > + * need the maximum T0SZ. > + * > + * Returns 0 on success, and -ENOMEM on failure. > + * On success trans_ttbr0 contains page table with idmapped page, t0sz is > set to > + * maxumum T0SZ for this page. maxumum > + */ Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v9 08/18] arm64: kexec: move relocation function setup
Hi Pavel, On 26/03/2020 03:24, Pavel Tatashin wrote: > Currently, kernel relocation function is configured in machine_kexec() > at the time of kexec reboot by using control_code_page. > > This operation, however, is more logical to be done during kexec_load, > and thus remove from reboot time. Move, setup of this function to > newly added machine_kexec_post_load(). This would avoid the need to special-case the cache maintenance, so its a good cleanup... > Because once MMU is enabled, kexec control page will contain more than > relocation kernel, but also vector table, add pointer to the actual > function within this page arch.kern_reloc. Currently, it equals to the > beginning of page, we will add offsets later, when vector table is > added. If the vector table always comes second, wouldn't this be extra work to hold the value 0? You can control the layout of this relocation code, as it has to be written in assembly. I don't get why this would be necessary. > diff --git a/arch/arm64/kernel/machine_kexec.c > b/arch/arm64/kernel/machine_kexec.c > index ae1bad0156cd..ec71a153cc2d 100644 > --- a/arch/arm64/kernel/machine_kexec.c > +++ b/arch/arm64/kernel/machine_kexec.c > @@ -58,6 +59,17 @@ void machine_kexec_cleanup(struct kimage *kimage) > /* Empty routine needed to avoid build errors. */ > } > > +int machine_kexec_post_load(struct kimage *kimage) > +{ > + void *reloc_code = page_to_virt(kimage->control_code_page); > + > + memcpy(reloc_code, arm64_relocate_new_kernel, > +arm64_relocate_new_kernel_size); > + kimage->arch.kern_reloc = __pa(reloc_code); Could we move the two cache maintenance calls for this area in here too. Keeping it next to the modification makes it clearer why it is required. In this case we can use flush_icache_range() instead of its __variant because this now happens much earlier. > + return 0; > +} Regardless, Reviewed-by: James Morse Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v9 05/18] arm64: trans_pgd: pass NULL instead of init_mm to *_populate functions
Hi Pavel, On 26/03/2020 03:24, Pavel Tatashin wrote: > trans_pgd_* should be independent from mm context because the tables that > are created by this code are used when there are no mm context around, as > it is between kernels. Simply replace mm_init's with NULL. arm64's p?d_populate() helpers don't use the mm parameter, so it doesn't make any difference. This was originally done so that if we ever needed anything from the mm, we didn't get a NULL dereference or EL0 behaviour due to a future '!= _mm'. If you think it matters, Acked-by: James Morse Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v9 03/18] arm64: trans_pgd: make trans_pgd_map_page generic
Hi Pavel, On 26/03/2020 03:24, Pavel Tatashin wrote: > kexec is going to use a different allocator, so make > trans_pgd_map_page to accept allocator as an argument, and also > kexec is going to use a different map protection, so also pass > it via argument. This trans_pgd_map_page() used to be create_single_mapping() It creates page tables that map one page: the relocation code. Why do you need a different pgprot? Surely PAGE_KERNEL_EXEC is exactly what you want. > diff --git a/arch/arm64/include/asm/trans_pgd.h > b/arch/arm64/include/asm/trans_pgd.h > index 23153c13d1ce..ad5194ad178d 100644 > --- a/arch/arm64/include/asm/trans_pgd.h > +++ b/arch/arm64/include/asm/trans_pgd.h > @@ -12,10 +12,24 @@ > #include > #include > > +/* > + * trans_alloc_page > + * - Allocator that should return exactly one zeroed page, if this > + *allocator fails, trans_pgd returns -ENOMEM error. trans_pgd is what you pass in to trans_pgd_map_page() or trans_pgd_create_copy(). Do you mean what those functions return? > + * > + * trans_alloc_arg > + * - Passed to trans_alloc_page as an argument > + */ > diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c > index 3d6f0fd73591..607bb1fbc349 100644 > --- a/arch/arm64/kernel/hibernate.c > +++ b/arch/arm64/kernel/hibernate.c > @@ -195,6 +200,11 @@ static int create_safe_exec_page(void *src_start, size_t > length, >unsigned long dst_addr, >phys_addr_t *phys_dst_addr) > { > + struct trans_pgd_info trans_info = { > + .trans_alloc_page = hibernate_page_alloc, > + .trans_alloc_arg= (void *)GFP_ATOMIC, > + }; As you need another copy of this in the next patch, is it worth declaring this globally and making it const? > diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c > index d20e48520cef..275a79935d7e 100644 > --- a/arch/arm64/mm/trans_pgd.c > +++ b/arch/arm64/mm/trans_pgd.c > @@ -180,8 +185,18 @@ int trans_pgd_create_copy(pgd_t **dst_pgdp, unsigned > long start, > return rc; > } > > -int trans_pgd_map_page(pgd_t *trans_pgd, void *page, unsigned long dst_addr, > -pgprot_t pgprot) > +/* > + * Add map entry to trans_pgd for a base-size page at PTE level. > + * info: contains allocator and its argument > + * trans_pgd:page table in which new map is added. > + * page: page to be mapped. > + * dst_addr: new VA address for the pages ~s/pages/page/ This thing only maps one page. > + * pgprot: protection for the page. > + * > + * Returns 0 on success, and -ENOMEM on failure. > + */ > +int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd, > +void *page, unsigned long dst_addr, pgprot_t pgprot) > { > pgd_t *pgdp; > pud_t *pudp; Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v9 02/18] arm64: hibernate: move page handling function to new trans_pgd.c
Hi Pavel, On 26/03/2020 03:24, Pavel Tatashin wrote: > Now, that we abstracted the required functions move them to a new home. > Later, we will generalize these function in order to be useful outside > of hibernation. Reviewed-by: James Morse Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v9 04/18] arm64: trans_pgd: pass allocator trans_pgd_create_copy
Hi Pavel, On 26/03/2020 03:24, Pavel Tatashin wrote: > Make trans_pgd_create_copy and its subroutines to use allocator that is > passed as an argument Reviewed-by: James Morse Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v9 01/18] arm64: kexec: make dtb_mem always enabled
Hi Pavel, On 26/03/2020 03:24, Pavel Tatashin wrote: > Currently, dtb_mem is enabled only when CONFIG_KEXEC_FILE is > enabled. This adds ugly ifdefs to c files. ~s/dtb_mem/ARCH_HAS_KIMAGE_ARCH/ ? dtb_mem is just one member of struct kimage_arch. > Always enabled dtb_mem, when it is not used, it is NULL. > Change the dtb_mem to phys_addr_t, as it is a physical address. Regardless, Reviewed-by: James Morse Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
RE: [PATCH] makedumpfile: cope with not-present mem section
Hi Pingfan, > -Original Message- > Hi Kazu and Cascardo, > > I encounter a weird problem when running makedumpfile on a s390 machine. > > Our production kernel uses extreme sparse memory model, and has the > following: > > in mm/sparse.c > > #ifdef CONFIG_SPARSEMEM_EXTREME > struct mem_section **mem_section; > #else > struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] > cacheline_internodealigned_in_smp; > #endif > > So in makedumpfile.c, get_mem_section(), it got a failed result when the > first call site to validate_mem_section(), then it should success at the > second call site to validate_mem_section(), which is inside if > (is_sparsemem_extreme()) condition. I think your production kernel should have kernel commit a0b1280368d1 ("kdump: write correct address of mem_section into vmcoreinfo"), so the first call should return TRUE and the second one should return FALSE. > > But the actual result is not like expected. > > After introducing > commit e113f1c974c820f9633dc0073eda525d7575f365[PATCH] cope with > not-present mem section > > I got two successful calls to validate_mem_section(), and finally failed > at the condition > ret = symbol_valid ^ pointer_valid; > if (!ret) { > ERRMSG("Could not validate mem_section.\n"); > } > > > Do you have any idea? Presumably this will be what I expected that it might be possible. I can apply the patch below this time, what about this? https://github.com/k-hagio/makedumpfile-old/commit/ce883df3864a5744ac0f1eff47de06b5074edb5f.patch or, we can also investigate why the second call returns TRUE, and fix the conditions in the validate_mem_section().. Thanks, Kazu ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] makedumpfile: cope with not-present mem section
Hi Kazu and Cascardo, I encounter a weird problem when running makedumpfile on a s390 machine. Our production kernel uses extreme sparse memory model, and has the following: in mm/sparse.c #ifdef CONFIG_SPARSEMEM_EXTREME struct mem_section **mem_section; #else struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] cacheline_internodealigned_in_smp; #endif So in makedumpfile.c, get_mem_section(), it got a failed result when the first call site to validate_mem_section(), then it should success at the second call site to validate_mem_section(), which is inside if (is_sparsemem_extreme()) condition. But the actual result is not like expected. After introducing commit e113f1c974c820f9633dc0073eda525d7575f365[PATCH] cope with not-present mem section I got two successful calls to validate_mem_section(), and finally failed at the condition ret = symbol_valid ^ pointer_valid; if (!ret) { ERRMSG("Could not validate mem_section.\n"); } Do you have any idea? Thanks, Pingfan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec