Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES
Le 19/04/2024 à 17:49, Mike Rapoport a écrit : > Hi Masami, > > On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote: >> Hi Mike, >> >> On Thu, 11 Apr 2024 19:00:50 +0300 >> Mike Rapoport wrote: >> >>> From: "Mike Rapoport (IBM)" >>> >>> kprobes depended on CONFIG_MODULES because it has to allocate memory for >>> code. >>> >>> Since code allocations are now implemented with execmem, kprobes can be >>> enabled in non-modular kernels. >>> >>> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside >>> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the >>> dependency of CONFIG_KPROBES on CONFIG_MODULES. >> >> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4. >> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in >> function body? We have enough dummy functions for that, so it should >> not make a problem. > > The code in check_kprobe_address_safe() that gets the module and checks for > __init functions does not compile with IS_ENABLED(CONFIG_MODULES). > I can pull it out to a helper or leave #ifdef in the function body, > whichever you prefer. As far as I can see, the only problem is MODULE_STATE_COMING. Can we move 'enum module_state' out of #ifdef CONFIG_MODULES in module.h ? > >> -- >> Masami Hiramatsu >
Re: [RFC PATCH 2/7] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations
Le 11/04/2024 à 18:05, Mike Rapoport a écrit : > From: "Mike Rapoport (IBM)" > > vmalloc allocations with VM_ALLOW_HUGE_VMAP that do not explictly > specify node ID will use huge pages only if size_per_node is larger than > PMD_SIZE. > Still the actual allocated memory is not distributed between nodes and > there is no advantage in such approach. > On the contrary, BPF allocates PMD_SIZE * num_possible_nodes() for each > new bpf_prog_pack, while it could do with PMD_SIZE'ed packs. > > Don't account for number of nodes for VM_ALLOW_HUGE_VMAP with > NUMA_NO_NODE and use huge pages whenever the requested allocation size > is larger than PMD_SIZE. Patch looks ok but message is confusing. We also use huge pages at PTE size, for instance 512k pages or 16k pages on powerpc 8xx, while PMD_SIZE is 4M. Christophe > > Signed-off-by: Mike Rapoport (IBM) > --- > mm/vmalloc.c | 9 ++--- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 22aa63f4ef63..5fc8b514e457 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3737,8 +3737,6 @@ void *__vmalloc_node_range(unsigned long size, unsigned > long align, > } > > if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) { > - unsigned long size_per_node; > - > /* >* Try huge pages. Only try for PAGE_KERNEL allocations, >* others like modules don't yet expect huge pages in > @@ -3746,13 +3744,10 @@ void *__vmalloc_node_range(unsigned long size, > unsigned long align, >* supporting them. >*/ > > - size_per_node = size; > - if (node == NUMA_NO_NODE) > - size_per_node /= num_online_nodes(); > - if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE) > + if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE) > shift = PMD_SHIFT; > else > - shift = arch_vmap_pte_supported_shift(size_per_node); > + shift = arch_vmap_pte_supported_shift(size); > > align = max(real_align, 1UL << shift); > size = ALIGN(real_size, 1UL << shift);
Re: [PATCH v7 2/2] arch/riscv: Enable kprobes when CONFIG_MODULES=n
Le 26/03/2024 à 14:46, Jarkko Sakkinen a écrit : > Tacing with kprobes while running a monolithic kernel is currently > impossible due the kernel module allocator dependency. > > Address the issue by implementing textmem API for RISC-V. > > Link: https://www.sochub.fi # for power on testing new SoC's with a minimal > stack > Link: > https://lore.kernel.org/all/2022060814.3054333-1-jar...@profian.com/ # > continuation > Signed-off-by: Jarkko Sakkinen > --- > v5-v7: > - No changes. > v4: > - Include linux/execmem.h. > v3: > - Architecture independent parts have been split to separate patches. > - Do not change arch/riscv/kernel/module.c as it is out of scope for >this patch set now. > v2: > - Better late than never right? :-) > - Focus only to RISC-V for now to make the patch more digestable. This >is the arch where I use the patch on a daily basis to help with QA. > - Introduce HAVE_KPROBES_ALLOC flag to help with more gradual migration. > --- > arch/riscv/Kconfig | 1 + > arch/riscv/kernel/Makefile | 3 +++ > arch/riscv/kernel/execmem.c | 22 ++ > 3 files changed, 26 insertions(+) > create mode 100644 arch/riscv/kernel/execmem.c > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index e3142ce531a0..499512fb17ff 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -132,6 +132,7 @@ config RISCV > select HAVE_KPROBES if !XIP_KERNEL > select HAVE_KPROBES_ON_FTRACE if !XIP_KERNEL > select HAVE_KRETPROBES if !XIP_KERNEL > + select HAVE_ALLOC_EXECMEM if !XIP_KERNEL > # https://github.com/ClangBuiltLinux/linux/issues/1881 > select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD > select HAVE_MOVE_PMD > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile > index 604d6bf7e476..337797f10d3e 100644 > --- a/arch/riscv/kernel/Makefile > +++ b/arch/riscv/kernel/Makefile > @@ -73,6 +73,9 @@ obj-$(CONFIG_SMP) += cpu_ops.o > > obj-$(CONFIG_RISCV_BOOT_SPINWAIT) += cpu_ops_spinwait.o > obj-$(CONFIG_MODULES) += module.o > +ifeq ($(CONFIG_ALLOC_EXECMEM),y) > +obj-y+= execmem.o Why not just : obj-$(CONFIG_ALLOC_EXECMEM) += execmem.o > +endif > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o > > obj-$(CONFIG_CPU_PM)+= suspend_entry.o suspend.o > diff --git a/arch/riscv/kernel/execmem.c b/arch/riscv/kernel/execmem.c > new file mode 100644 > index ..3e52522ead32 > --- /dev/null > +++ b/arch/riscv/kernel/execmem.c > @@ -0,0 +1,22 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > + > +#include > +#include > +#include > +#include > + > +void *alloc_execmem(unsigned long size, gfp_t /* gfp */) > +{ > + return __vmalloc_node_range(size, 1, MODULES_VADDR, > + MODULES_END, GFP_KERNEL, Why not use gfp argument ? > + PAGE_KERNEL, 0, NUMA_NO_NODE, > + __builtin_return_address(0)); > +} > + > +void free_execmem(void *region) > +{ > + if (in_interrupt()) > + pr_warn("In interrupt context: vmalloc may not work.\n"); Do you expect that to happen ? module_memfree() has a WARN_ON() meaning this should never happen and if it really does it is not just a poor dmesg warning. > + > + vfree(region); > +}
Re: [PATCH v7 1/2] kprobes: Implement trampoline memory allocator for tracing
Le 26/03/2024 à 14:46, Jarkko Sakkinen a écrit : > Tracing with kprobes while running a monolithic kernel is currently > impossible because CONFIG_KPROBES depends on CONFIG_MODULES. > > Introduce alloc_execmem() and free_execmem() for allocating executable > memory. If an arch implements these functions, it can mark this up with > the HAVE_ALLOC_EXECMEM kconfig flag. > > The second new kconfig flag is ALLOC_EXECMEM, which can be selected if > either MODULES is selected or HAVE_ALLOC_EXECMEM is support by the arch. If > HAVE_ALLOC_EXECMEM is not supported by an arch, module_alloc() and > module_memfree() are used as a fallback, thus retaining backwards > compatibility to earlier kernel versions. > > This will allow architecture to enable kprobes traces without requiring > to enable module. > > The support can be implemented with four easy steps: > > 1. Implement alloc_execmem(). > 2. Implement free_execmem(). > 3. Edit arch//Makefile. > 4. Set HAVE_ALLOC_EXECMEM in arch//Kconfig. > > Link: > https://lore.kernel.org/all/20240325115632.04e37297491cadfbbf382...@kernel.org/ > Suggested-by: Masami Hiramatsu > Signed-off-by: Jarkko Sakkinen > --- > v7: > - Use "depends on" for ALLOC_EXECMEM instead of "select" > - Reduced and narrowed CONFIG_MODULES checks further in kprobes.c. > v6: > - Use null pointer for notifiers and register the module notifier only if >IS_ENABLED(CONFIG_MODULES) is set. > - Fixed typo in the commit message and wrote more verbose description >of the feature. > v5: > - alloc_execmem() was missing GFP_KERNEL parameter. The patch set did >compile because 2/2 had the fixup (leaked there when rebasing the >patch set). > v4: > - Squashed a couple of unrequired CONFIG_MODULES checks. > - See https://lore.kernel.org/all/d034m18d63ec.2y11d954ys...@kernel.org/ > v3: > - A new patch added. > - For IS_DEFINED() I need advice as I could not really find that many >locations where it would be applicable. > --- > arch/Kconfig| 17 +++- > include/linux/execmem.h | 13 + > kernel/kprobes.c| 53 ++--- > kernel/trace/trace_kprobe.c | 15 +-- > 4 files changed, 73 insertions(+), 25 deletions(-) > create mode 100644 include/linux/execmem.h > > diff --git a/arch/Kconfig b/arch/Kconfig > index a5af0edd3eb8..5e9735f60f3c 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -52,8 +52,8 @@ config GENERIC_ENTRY > > config KPROBES > bool "Kprobes" > - depends on MODULES > depends on HAVE_KPROBES > + depends on ALLOC_EXECMEM > select KALLSYMS > select TASKS_RCU if PREEMPTION > help > @@ -215,6 +215,21 @@ config HAVE_OPTPROBES > config HAVE_KPROBES_ON_FTRACE > bool > > +config HAVE_ALLOC_EXECMEM > + bool > + help > + Architectures that select this option are capable of allocating > trampoline > + executable memory for tracing subsystems, indepedently of the kernel > module > + subsystem. > + > +config ALLOC_EXECMEM > + bool "Executable (trampoline) memory allocation" Why make it user selectable ? Previously I was able to select KPROBES as soon as MODULES was selected. Now I will have to first select ALLOC_EXECMEM in addition ? What is the added value of allowing the user to disable it ? > + default y > + depends on MODULES || HAVE_ALLOC_EXECMEM > + help > + Select this for executable (trampoline) memory. Can be enabled when > either > + module allocator or arch-specific allocator is available. > + > config ARCH_CORRECT_STACKTRACE_ON_KRETPROBE > bool > help > diff --git a/include/linux/execmem.h b/include/linux/execmem.h > new file mode 100644 > index ..ae2ff151523a > --- /dev/null > +++ b/include/linux/execmem.h > @@ -0,0 +1,13 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _LINUX_EXECMEM_H > +#define _LINUX_EXECMEM_H > + It should include moduleloader.h otherwise the user of alloc_execmem() must include both this header and moduleloader.h to use alloc_execmem() > +#ifdef CONFIG_HAVE_ALLOC_EXECMEM > +void *alloc_execmem(unsigned long size, gfp_t gfp); > +void free_execmem(void *region); > +#else > +#define alloc_execmem(size, gfp) module_alloc(size) Then gfp is silently ignored in the case. Is that expected ? > +#define free_execmem(region) module_memfree(region) > +#endif > + > +#endif /* _LINUX_EXECMEM_H */ > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index 9d9095e81792..13bef5de315c 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -44,6 +44,7 @@ > #include > #include > #include > +#include > > #define KPROBE_HASH_BITS 6 > #define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS) > @@ -113,17 +114,17 @@ enum kprobe_slot_state { > void __weak *alloc_insn_page(void) > { > /* > - * Use module_alloc() so this page is within +/- 2GB of where the > + * Use alloc_execmem() so this
Re: [RFC][PATCH 3/4] kprobes: Allow kprobes with CONFIG_MODULES=n
Le 06/03/2024 à 21:05, Calvin Owens a écrit : > [Vous ne recevez pas souvent de courriers de jcalvinow...@gmail.com. > Découvrez pourquoi ceci est important à > https://aka.ms/LearnAboutSenderIdentification ] > > If something like this is merged down the road, it can go in at leisure > once the module_alloc change is in: it's a one-way dependency. Too many #ifdef, please reorganise stuff to avoid that and avoid changing prototypes based of CONFIG_MODULES. Other few comments below. > > Signed-off-by: Calvin Owens > --- > arch/Kconfig| 2 +- > kernel/kprobes.c| 22 ++ > kernel/trace/trace_kprobe.c | 11 +++ > 3 files changed, 34 insertions(+), 1 deletion(-) > > diff --git a/arch/Kconfig b/arch/Kconfig > index cfc24ced16dd..e60ce984d095 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -52,8 +52,8 @@ config GENERIC_ENTRY > > config KPROBES > bool "Kprobes" > - depends on MODULES > depends on HAVE_KPROBES > + select MODULE_ALLOC > select KALLSYMS > select TASKS_RCU if PREEMPTION > help > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index 9d9095e81792..194270e17d57 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -1556,8 +1556,12 @@ static bool is_cfi_preamble_symbol(unsigned long addr) > str_has_prefix("__pfx_", symbuf); > } > > +#if IS_ENABLED(CONFIG_MODULES) > static int check_kprobe_address_safe(struct kprobe *p, > struct module **probed_mod) > +#else > +static int check_kprobe_address_safe(struct kprobe *p) > +#endif A bit ugly to have to change the prototype, why not just keep probed_mod at all time ? When CONFIG_MODULES is not selected, __module_text_address() returns NULL so it should work without that many #ifdefs. > { > int ret; > > @@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p, > goto out; > } > > +#if IS_ENABLED(CONFIG_MODULES) > /* Check if 'p' is probing a module. */ > *probed_mod = __module_text_address((unsigned long) p->addr); > if (*probed_mod) { > @@ -1603,6 +1608,8 @@ static int check_kprobe_address_safe(struct kprobe *p, > ret = -ENOENT; > } > } > +#endif > + > out: > preempt_enable(); > jump_label_unlock(); > @@ -1614,7 +1621,9 @@ int register_kprobe(struct kprobe *p) > { > int ret; > struct kprobe *old_p; > +#if IS_ENABLED(CONFIG_MODULES) > struct module *probed_mod; > +#endif > kprobe_opcode_t *addr; > bool on_func_entry; > > @@ -1633,7 +1642,11 @@ int register_kprobe(struct kprobe *p) > p->nmissed = 0; > INIT_LIST_HEAD(>list); > > +#if IS_ENABLED(CONFIG_MODULES) > ret = check_kprobe_address_safe(p, _mod); > +#else > + ret = check_kprobe_address_safe(p); > +#endif > if (ret) > return ret; > > @@ -1676,8 +1689,10 @@ int register_kprobe(struct kprobe *p) > out: > mutex_unlock(_mutex); > > +#if IS_ENABLED(CONFIG_MODULES) > if (probed_mod) > module_put(probed_mod); > +#endif > > return ret; > } > @@ -2482,6 +2497,7 @@ int kprobe_add_area_blacklist(unsigned long start, > unsigned long end) > return 0; > } > > +#if IS_ENABLED(CONFIG_MODULES) > /* Remove all symbols in given area from kprobe blacklist */ > static void kprobe_remove_area_blacklist(unsigned long start, unsigned long > end) > { > @@ -2499,6 +2515,7 @@ static void kprobe_remove_ksym_blacklist(unsigned long > entry) > { > kprobe_remove_area_blacklist(entry, entry + 1); > } > +#endif > > int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long > *value, > char *type, char *sym) > @@ -2564,6 +2581,7 @@ static int __init populate_kprobe_blacklist(unsigned > long *start, > return ret ? : arch_populate_kprobe_blacklist(); > } > > +#if IS_ENABLED(CONFIG_MODULES) > static void add_module_kprobe_blacklist(struct module *mod) > { > unsigned long start, end; > @@ -2665,6 +2683,7 @@ static struct notifier_block kprobe_module_nb = { > .notifier_call = kprobes_module_callback, > .priority = 0 > }; > +#endif /* IS_ENABLED(CONFIG_MODULES) */ > > void kprobe_free_init_mem(void) > { > @@ -2724,8 +2743,11 @@ static int __init init_kprobes(void) > err = arch_init_kprobes(); > if (!err) > err = register_die_notifier(_exceptions_nb); > + > +#if IS_ENABLED(CONFIG_MODULES) > if (!err) > err = register_module_notifier(_module_nb); > +#endif > > kprobes_initialized = (err == 0); > kprobe_sysctls_init(); > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c > index
Re: [RFC][PATCH 2/4] bpf: Allow BPF_JIT with CONFIG_MODULES=n
Le 06/03/2024 à 21:05, Calvin Owens a écrit : > [Vous ne recevez pas souvent de courriers de jcalvinow...@gmail.com. > Découvrez pourquoi ceci est important à > https://aka.ms/LearnAboutSenderIdentification ] > > No BPF code has to change, except in struct_ops (for module refs). > > This conflicts with bpf-next because of this (relevant) series: > > https://lore.kernel.org/all/20240119225005.668602-1-thinker...@gmail.com/ > > If something like this is merged down the road, it can go through > bpf-next at leisure once the module_alloc change is in: it's a one-way > dependency. > > Signed-off-by: Calvin Owens > --- > kernel/bpf/Kconfig | 2 +- > kernel/bpf/bpf_struct_ops.c | 28 > 2 files changed, 25 insertions(+), 5 deletions(-) > > diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig > index 6a906ff93006..77df483a8925 100644 > --- a/kernel/bpf/Kconfig > +++ b/kernel/bpf/Kconfig > @@ -42,7 +42,7 @@ config BPF_JIT > bool "Enable BPF Just In Time compiler" > depends on BPF > depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT > - depends on MODULES > + select MODULE_ALLOC > help >BPF programs are normally handled by a BPF interpreter. This option >allows the kernel to generate native code when a program is loaded > diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c > index 02068bd0e4d9..fbf08a1bb00c 100644 > --- a/kernel/bpf/bpf_struct_ops.c > +++ b/kernel/bpf/bpf_struct_ops.c > @@ -108,11 +108,30 @@ const struct bpf_prog_ops bpf_struct_ops_prog_ops = { > #endif > }; > > +#if IS_ENABLED(CONFIG_MODULES) Can you avoid ifdefs as much as possible ? > static const struct btf_type *module_type; > > +static int bpf_struct_module_type_init(struct btf *btf) > +{ > + s32 module_id; Could be: if (!IS_ENABLED(CONFIG_MODULES)) return 0; > + > + module_id = btf_find_by_name_kind(btf, "module", BTF_KIND_STRUCT); > + if (module_id < 0) > + return 1; > + > + module_type = btf_type_by_id(btf, module_id); > + return 0; > +} > +#else > +static int bpf_struct_module_type_init(struct btf *btf) > +{ > + return 0; > +} > +#endif > + > void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) > { > - s32 type_id, value_id, module_id; > + s32 type_id, value_id; > const struct btf_member *member; > struct bpf_struct_ops *st_ops; > const struct btf_type *t; > @@ -125,12 +144,10 @@ void bpf_struct_ops_init(struct btf *btf, struct > bpf_verifier_log *log) > #include "bpf_struct_ops_types.h" > #undef BPF_STRUCT_OPS_TYPE > > - module_id = btf_find_by_name_kind(btf, "module", BTF_KIND_STRUCT); > - if (module_id < 0) { > + if (bpf_struct_module_type_init(btf)) { > pr_warn("Cannot find struct module in btf_vmlinux\n"); > return; > } > - module_type = btf_type_by_id(btf, module_id); > > for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { > st_ops = bpf_struct_ops[i]; > @@ -433,12 +450,15 @@ static long bpf_struct_ops_map_update_elem(struct > bpf_map *map, void *key, > > moff = __btf_member_bit_offset(t, member) / 8; > ptype = btf_type_resolve_ptr(btf_vmlinux, member->type, > NULL); > + > +#if IS_ENABLED(CONFIG_MODULES) Can't see anything depending on CONFIG_MODULES here, can you instead do: if (IS_ENABLED(CONFIG_MODULES) && ptype == module_type) { > if (ptype == module_type) { > if (*(void **)(udata + moff)) > goto reset_unlock; > *(void **)(kdata + moff) = BPF_MODULE_OWNER; > continue; > } > +#endif > > err = st_ops->init_member(t, member, kdata, udata); > if (err < 0) > -- > 2.43.0 > >
Re: [RFC][PATCH 1/4] module: mm: Make module_alloc() generally available
Hi Calvin, Le 06/03/2024 à 21:05, Calvin Owens a écrit : > [Vous ne recevez pas souvent de courriers de jcalvinow...@gmail.com. > Découvrez pourquoi ceci est important à > https://aka.ms/LearnAboutSenderIdentification ] > > Both BPF_JIT and KPROBES depend on CONFIG_MODULES, but only require > module_alloc() itself, which can be easily separated into a standalone > allocator for executable kernel memory. Easily maybe, but not as easily as you think, see below. > > Thomas Gleixner sent a patch to do that for x86 as part of a larger > series a couple years ago: > > https://lore.kernel.org/all/20220716230953.442937...@linutronix.de/ > > I've simply extended that approach to the whole kernel. > > Signed-off-by: Calvin Owens > --- > arch/Kconfig | 2 +- > arch/arm/kernel/module.c | 35 - > arch/arm/mm/Makefile | 2 + > arch/arm/mm/module_alloc.c | 40 ++ > arch/arm64/kernel/module.c | 127 -- > arch/arm64/mm/Makefile | 1 + > arch/arm64/mm/module_alloc.c | 130 +++ > arch/loongarch/kernel/module.c | 6 -- > arch/loongarch/mm/Makefile | 2 + > arch/loongarch/mm/module_alloc.c | 10 +++ > arch/mips/kernel/module.c| 10 --- > arch/mips/mm/Makefile| 2 + > arch/mips/mm/module_alloc.c | 13 > arch/nios2/kernel/module.c | 20 - > arch/nios2/mm/Makefile | 2 + > arch/nios2/mm/module_alloc.c | 22 ++ > arch/parisc/kernel/module.c | 12 --- > arch/parisc/mm/Makefile | 1 + > arch/parisc/mm/module_alloc.c| 15 > arch/powerpc/kernel/module.c | 36 - > arch/powerpc/mm/Makefile | 1 + > arch/powerpc/mm/module_alloc.c | 41 ++ Missing several powerpc changes to make it work. You must audit every use of CONFIG_MODULES inside powerpc. Here are a few exemples: Function get_patch_pfn() to enable text code patching. arch/powerpc/Kconfig : select KASAN_VMALLOCif KASAN && MODULES arch/powerpc/include/asm/kasan.h: #if defined(CONFIG_MODULES) && defined(CONFIG_PPC32) #define KASAN_KERN_STARTALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M) #else #define KASAN_KERN_STARTPAGE_OFFSET #endif arch/powerpc/kernel/head_8xx.S and arch/powerpc/kernel/head_book3s_32.S: InstructionTLBMiss interrupt handler must know that there is executable kernel text outside kernel core. Function is_module_segment() to identified segments used for module text and set NX (NoExec) MMU flag on non-module segments. > arch/riscv/kernel/module.c | 11 --- > arch/riscv/mm/Makefile | 1 + > arch/riscv/mm/module_alloc.c | 17 > arch/s390/kernel/module.c| 37 - > arch/s390/mm/Makefile| 1 + > arch/s390/mm/module_alloc.c | 42 ++ > arch/sparc/kernel/module.c | 31 > arch/sparc/mm/Makefile | 2 + > arch/sparc/mm/module_alloc.c | 31 > arch/x86/kernel/ftrace.c | 2 +- > arch/x86/kernel/module.c | 56 - > arch/x86/mm/Makefile | 2 + > arch/x86/mm/module_alloc.c | 59 ++ > fs/proc/kcore.c | 2 +- > kernel/module/Kconfig| 1 + > kernel/module/main.c | 17 > mm/Kconfig | 3 + > mm/Makefile | 1 + > mm/module_alloc.c| 21 + > mm/vmalloc.c | 2 +- > 42 files changed, 467 insertions(+), 402 deletions(-) ... > diff --git a/mm/Kconfig b/mm/Kconfig > index ffc3a2ba3a8c..92bfb5ae2e95 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -1261,6 +1261,9 @@ config LOCK_MM_AND_FIND_VMA > config IOMMU_MM_DATA > bool > > +config MODULE_ALLOC > + def_bool n > + I'd call it something else than CONFIG_MODULE_ALLOC as you want to use it when CONFIG_MODULE is not selected. Something like CONFIG_EXECMEM_ALLOC or CONFIG_DYNAMIC_EXECMEM ? Christophe
Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time
Le 31/01/2024 à 16:17, Marek Szyprowski a écrit : > [Vous ne recevez pas souvent de courriers de m.szyprow...@samsung.com. > Découvrez pourquoi ceci est important à > https://aka.ms/LearnAboutSenderIdentification ] > > Hi Christophe, > > On 31.01.2024 12:58, Christophe Leroy wrote: >> Le 30/01/2024 à 18:48, Marek Szyprowski a écrit : >>> [Vous ne recevez pas souvent de courriers de m.szyprow...@samsung.com. >>> Découvrez pourquoi ceci est important à >>> https://aka.ms/LearnAboutSenderIdentification ] >>> >>> On 30.01.2024 12:03, Christophe Leroy wrote: >>>> Le 30/01/2024 à 10:16, Chen-Yu Tsai a écrit : >>>>> [Vous ne recevez pas souvent de courriers de we...@chromium.org. >>>>> D?couvrez pourquoi ceci est important ? >>>>> https://aka.ms/LearnAboutSenderIdentification ] >>>>> >>>>> On Mon, Jan 29, 2024 at 12:09:50PM -0800, Luis Chamberlain wrote: >>>>>> On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote: >>>>>>> Declaring rodata_enabled and mark_rodata_ro() at all time >>>>>>> helps removing related #ifdefery in C files. >>>>>>> >>>>>>> Signed-off-by: Christophe Leroy >>>>>> Very nice cleanup, thanks!, applied and pushed >>>>>> >>>>>>Luis >>>>> On next-20240130, which has your modules-next branch, and thus this >>>>> series and the other "module: Use set_memory_rox()" series applied, >>>>> my kernel crashes in some very weird way. Reverting your branch >>>>> makes the crash go away. >>>>> >>>>> I thought I'd report it right away. Maybe you folks would know what's >>>>> happening here? This is on arm64. >>>> That's strange, it seems to bug in module_bug_finalize() which is >>>> _before_ calls to module_enable_ro() and such. >>>> >>>> Can you try to revert the 6 patches one by one to see which one >>>> introduces the problem ? >>>> >>>> In reality, only patch 677bfb9db8a3 really change things. Other ones are >>>> more on less only cleanup. >>> I've also run into this issue with today's (20240130) linux-next on my >>> test farm. The issue is not fully reproducible, so it was a bit hard to >>> bisect it automatically. I've spent some time on manual testing and it >>> looks that reverting the following 2 commits on top of linux-next fixes >>> the problem: >>> >>> 65929884f868 ("modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around >>> rodata_enabled") >>> 677bfb9db8a3 ("module: Don't ignore errors from set_memory_XX()") >>> >>> This in fact means that commit 677bfb9db8a3 is responsible for this >>> regression, as 65929884f868 has to be reverted only because the latter >>> depends on it. Let me know what I can do to help debugging this issue. >>> >> Thanks for the bisect. I suspect you hit one of the errors and something >> goes wrong in the error path. >> >> To confirm this assumption, could you try with the following change on >> top of everything ? > > > Yes, this is the problem. I've added printing a mod->name to the log. > Here is a log from kernel build from next-20240130 (sometimes it even > boots to shell): > > # dmesg | grep module_set_memory > [ 8.061525] module_set_memory(6, , 0) name ipv6 > returned -22 > [ 8.067543] WARNING: CPU: 3 PID: 1 at kernel/module/strict_rwx.c:22 > module_set_memory+0x9c/0xb8 Would be good if you could show the backtrace too so that we know who is the caller. I guess what you show here is what you get on the screen ? The backtrace should be available throught 'dmesg'. I guess we will now seek help from ARM64 people to understand why module_set_memory_something() fails with -EINVAL when loading modules. > [ 8.097821] pc : module_set_memory+0x9c/0xb8 > [ 8.102068] lr : module_set_memory+0x9c/0xb8 > [ 8.183101] module_set_memory+0x9c/0xb8 > [ 8.472862] module_set_memory(6, , 0) name x_tables > returned -22 > [ 8.479215] WARNING: CPU: 2 PID: 1 at kernel/module/strict_rwx.c:22 > module_set_memory+0x9c/0xb8 > [ 8.510978] pc : module_set_memory+0x9c/0xb8 > [ 8.515225] lr : module_set_memory+0x9c/0xb8 > [ 8.596259] module_set_memory+0x9c/0xb8 > [ 10.529879] module_set_memory(6, , 0) name dm_mod > returned -22 > [ 10.536087] WARNING: CPU: 3 PID: 127 at kernel/module/strict_rwx.c:22 > mod
Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time
Hi, Le 30/01/2024 à 18:48, Marek Szyprowski a écrit : > [Vous ne recevez pas souvent de courriers de m.szyprow...@samsung.com. > Découvrez pourquoi ceci est important à > https://aka.ms/LearnAboutSenderIdentification ] > > Dear All, > > On 30.01.2024 12:03, Christophe Leroy wrote: >> Le 30/01/2024 à 10:16, Chen-Yu Tsai a écrit : >>> [Vous ne recevez pas souvent de courriers de we...@chromium.org. D?couvrez >>> pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification >>> ] >>> >>> On Mon, Jan 29, 2024 at 12:09:50PM -0800, Luis Chamberlain wrote: >>>> On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote: >>>>> Declaring rodata_enabled and mark_rodata_ro() at all time >>>>> helps removing related #ifdefery in C files. >>>>> >>>>> Signed-off-by: Christophe Leroy >>>> Very nice cleanup, thanks!, applied and pushed >>>> >>>> Luis >>> On next-20240130, which has your modules-next branch, and thus this >>> series and the other "module: Use set_memory_rox()" series applied, >>> my kernel crashes in some very weird way. Reverting your branch >>> makes the crash go away. >>> >>> I thought I'd report it right away. Maybe you folks would know what's >>> happening here? This is on arm64. >> That's strange, it seems to bug in module_bug_finalize() which is >> _before_ calls to module_enable_ro() and such. >> >> Can you try to revert the 6 patches one by one to see which one >> introduces the problem ? >> >> In reality, only patch 677bfb9db8a3 really change things. Other ones are >> more on less only cleanup. > > I've also run into this issue with today's (20240130) linux-next on my > test farm. The issue is not fully reproducible, so it was a bit hard to > bisect it automatically. I've spent some time on manual testing and it > looks that reverting the following 2 commits on top of linux-next fixes > the problem: > > 65929884f868 ("modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around > rodata_enabled") > 677bfb9db8a3 ("module: Don't ignore errors from set_memory_XX()") > > This in fact means that commit 677bfb9db8a3 is responsible for this > regression, as 65929884f868 has to be reverted only because the latter > depends on it. Let me know what I can do to help debugging this issue. > Thanks for the bisect. I suspect you hit one of the errors and something goes wrong in the error path. To confirm this assumption, could you try with the following change on top of everything ? diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c index a14df9655dbe..fdf8484154dd 100644 --- a/kernel/module/strict_rwx.c +++ b/kernel/module/strict_rwx.c @@ -15,9 +15,12 @@ static int module_set_memory(const struct module *mod, enum mod_mem_type type, int (*set_memory)(unsigned long start, int num_pages)) { const struct module_memory *mod_mem = >mem[type]; + int err; set_vm_flush_reset_perms(mod_mem->base); - return set_memory((unsigned long)mod_mem->base, mod_mem->size >> PAGE_SHIFT); + err = set_memory((unsigned long)mod_mem->base, mod_mem->size >> PAGE_SHIFT); + WARN(err, "module_set_memory(%d, %px, %x) returned %d\n", type, mod_mem->base, mod_mem->size, err); + return err; } /* Thanks for your help Christophe
Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time
Le 30/01/2024 à 21:27, Luis Chamberlain a écrit : > On Tue, Jan 30, 2024 at 06:48:11PM +0100, Marek Szyprowski wrote: >> Dear All, >> >> On 30.01.2024 12:03, Christophe Leroy wrote: >>> Le 30/01/2024 à 10:16, Chen-Yu Tsai a écrit : >>>> [Vous ne recevez pas souvent de courriers de we...@chromium.org. D?couvrez >>>> pourquoi ceci est important ? >>>> https://aka.ms/LearnAboutSenderIdentification ] >>>> >>>> On Mon, Jan 29, 2024 at 12:09:50PM -0800, Luis Chamberlain wrote: >>>>> On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote: >>>>>> Declaring rodata_enabled and mark_rodata_ro() at all time >>>>>> helps removing related #ifdefery in C files. >>>>>> >>>>>> Signed-off-by: Christophe Leroy >>>>> Very nice cleanup, thanks!, applied and pushed >>>>> >>>>> Luis >>>> On next-20240130, which has your modules-next branch, and thus this >>>> series and the other "module: Use set_memory_rox()" series applied, >>>> my kernel crashes in some very weird way. Reverting your branch >>>> makes the crash go away. >>>> >>>> I thought I'd report it right away. Maybe you folks would know what's >>>> happening here? This is on arm64. >>> That's strange, it seems to bug in module_bug_finalize() which is >>> _before_ calls to module_enable_ro() and such. >>> >>> Can you try to revert the 6 patches one by one to see which one >>> introduces the problem ? >>> >>> In reality, only patch 677bfb9db8a3 really change things. Other ones are >>> more on less only cleanup. >> >> I've also run into this issue with today's (20240130) linux-next on my >> test farm. The issue is not fully reproducible, so it was a bit hard to >> bisect it automatically. I've spent some time on manual testing and it >> looks that reverting the following 2 commits on top of linux-next fixes >> the problem: >> >> 65929884f868 ("modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around >> rodata_enabled") >> 677bfb9db8a3 ("module: Don't ignore errors from set_memory_XX()") >> >> This in fact means that commit 677bfb9db8a3 is responsible for this >> regression, as 65929884f868 has to be reverted only because the latter >> depends on it. Let me know what I can do to help debugging this issue. > > Thanks for the bisect, I've reset my tree to commit > 3559ad395bf02 ("module: Change module_enable_{nx/x/ro}() to more > explicit names") for now then, so to remove those commits. > The problem being identified in commit 677bfb9db8a3 ("module: Don't ignore errors from set_memory_XX()"), you can keep/re-apply the series [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time. Christophe
Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time
Le 30/01/2024 à 10:16, Chen-Yu Tsai a écrit : > [Vous ne recevez pas souvent de courriers de we...@chromium.org. D?couvrez > pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification ] > > Hi, > > On Mon, Jan 29, 2024 at 12:09:50PM -0800, Luis Chamberlain wrote: >> On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote: >>> Declaring rodata_enabled and mark_rodata_ro() at all time >>> helps removing related #ifdefery in C files. >>> >>> Signed-off-by: Christophe Leroy >> >> Very nice cleanup, thanks!, applied and pushed >> >>Luis > > On next-20240130, which has your modules-next branch, and thus this > series and the other "module: Use set_memory_rox()" series applied, > my kernel crashes in some very weird way. Reverting your branch > makes the crash go away. > > I thought I'd report it right away. Maybe you folks would know what's > happening here? This is on arm64. That's strange, it seems to bug in module_bug_finalize() which is _before_ calls to module_enable_ro() and such. Can you try to revert the 6 patches one by one to see which one introduces the problem ? In reality, only patch 677bfb9db8a3 really change things. Other ones are more on less only cleanup. Thanks Christophe > > [ 10.481015] Unable to handle kernel paging request at virtual address > ffde85245d30 > [ 10.490369] KASAN: maybe wild-memory-access in range > [0x00f42922e980-0x00f42922e987] > [ 10.503744] Mem abort info: > [ 10.509383] ESR = 0x9647 > [ 10.514400] EC = 0x25: DABT (current EL), IL = 32 bits > [ 10.522366] SET = 0, FnV = 0 > [ 10.526343] EA = 0, S1PTW = 0 > [ 10.530695] FSC = 0x07: level 3 translation fault > [ 10.537081] Data abort info: > [ 10.540839] ISV = 0, ISS = 0x0047, ISS2 = 0x > [ 10.546456] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 > [ 10.551726] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > [ 10.557612] swapper pgtable: 4k pages, 39-bit VAs, pgdp=41f98000 > [ 10.565214] [ffde85245d30] pgd=10023003, p4d=10023003, > pud=10023003, pmd=1001121eb003, pte= > [ 10.578887] Internal error: Oops: 9647 [#1] PREEMPT SMP > [ 10.585815] Modules linked in: > [ 10.590235] CPU: 6 PID: 195 Comm: (udev-worker) Tainted: GB > 6.8.0-rc2-next-20240130-02908-ge8ad01d60927-dirty #163 > 3f2318148ecc5fa70d1092c2b874f9b59bdb7d60 > [ 10.607021] Hardware name: Google Tentacruel board (DT) > [ 10.613607] pstate: a049 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 10.621954] pc : module_bug_finalize+0x118/0x148 > [ 10.626823] lr : module_bug_finalize+0x118/0x148 > [ 10.631463] sp : ffc0820478d0 > [ 10.631466] x29: ffc0820478d0 x28: ffc082047ca0 x27: > ffde8d7d31a0 > [ 10.631477] x26: ffde85223780 x25: x24: > ffde8c413cc0 > [ 10.631486] x23: ffde8dfcec80 x22: ffde8dfce000 x21: > ffde85223ba8 > [ 10.631495] x20: ffde85223780 x19: ffde85245d28 x18: > > [ 10.631504] x17: ffde8aa15938 x16: ffde8aabdd90 x15: > ffde8aab8124 > [ 10.631513] x14: ffde8acdd380 x13: 41b58ab3 x12: > ffbbd1bf9d91 > [ 10.631522] x11: 1ffbd1bf9d90 x10: ffbbd1bf9d90 x9 : > dfc0 > [ 10.631531] x8 : 00442e406270 x7 : ffde8dfcec87 x6 : > 0001 > [ 10.631539] x5 : ffde8dfcec80 x4 : x3 : > ffde8bbadf08 > [ 10.631548] x2 : 0001 x1 : ffde8eaff080 x0 : > > [ 10.631556] Call trace: > [ 10.631559] module_bug_finalize+0x118/0x148 > [ 10.631565] load_module+0x25ec/0x2a78 > [ 10.631572] __do_sys_init_module+0x234/0x418 > [ 10.631578] __arm64_sys_init_module+0x4c/0x68 > [ 10.631584] invoke_syscall+0x68/0x198 > [ 10.631589] el0_svc_common.constprop.0+0x11c/0x150 > [ 10.631594] do_el0_svc+0x38/0x50 > [ 10.631598] el0_svc+0x50/0xa0 > [ 10.631604] el0t_64_sync_handler+0x120/0x130 > [ 10.631609] el0t_64_sync+0x1a8/0x1b0 > [ 10.631619] Code: 97c5418e c89ffef5 91002260 97c53ca7 (f9000675) > [ 10.631624] ---[ end trace ]--- > [ 10.642965] Kernel panic - not syncing: Oops: Fatal exception > [ 10.642975] SMP: stopping secondary CPUs > [ 10.648339] Kernel Offset: 0x1e0a80 from 0xffc08000 > [ 10.648343] PHYS_OFFSET: 0x4000 > [ 10.648345] CPU features: 0x0,c061,7002814a,2100720b > [ 10.648350] Memory Limit: none >
Re: [PATCH] powerpc/papr_scm: Move duplicate definitions to common header files
Le 18/04/2022 à 06:38, Shivaprasad G Bhat a écrit : > papr_scm and ndtest share common PDSM payload structs like > nd_papr_pdsm_health. Presently these structs are duplicated across > papr_pdsm.h and ndtest.h header files. Since 'ndtest' is essentially > arch independent and can run on platforms other than PPC64, a way > needs to be deviced to avoid redundancy and duplication of PDSM > structs in future. > > So the patch proposes moving the PDSM header from arch/powerpc/include- > -/uapi/ to the generic include/uapi/linux directory. Also, there are > some #defines common between papr_scm and ndtest which are not exported > to the user space. So, move them to a header file which can be shared > across ndtest and papr_scm via newly introduced include/linux/papr_scm.h. > > Signed-off-by: Shivaprasad G Bhat > Signed-off-by: Vaibhav Jain > Suggested-by: "Aneesh Kumar K.V" This patch doesn't apply, if still relevant can you please rebase and re-submit ? Thanks Christophe > --- > Changelog: > Since v2: > Link: > https://patchwork.kernel.org/project/linux-nvdimm/patch/163454440296.431294.2368481747380790011.st...@lep8c.aus.stglabs.ibm.com/ > * Made it like v1, and rebased. > * Fixed repeating words in comments of the header file papr_scm.h > > Since v1: > Link: > https://patchwork.kernel.org/project/linux-nvdimm/patch/162505488483.72147.12741153746322191381.stgit@56e104a48989/ > * Removed dependency on this patch for the other patches > > MAINTAINERS |2 > arch/powerpc/include/uapi/asm/papr_pdsm.h | 165 > - > arch/powerpc/platforms/pseries/papr_scm.c | 43 > include/linux/papr_scm.h | 49 + > include/uapi/linux/papr_pdsm.h| 165 > + > tools/testing/nvdimm/test/ndtest.c|2 > tools/testing/nvdimm/test/ndtest.h| 31 - > 7 files changed, 220 insertions(+), 237 deletions(-) > delete mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h > create mode 100644 include/linux/papr_scm.h > create mode 100644 include/uapi/linux/papr_pdsm.h > > diff --git a/MAINTAINERS b/MAINTAINERS > index 1699bb7cc867..03685b074dda 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -11254,6 +11254,8 @@ F:drivers/rtc/rtc-opal.c > F: drivers/scsi/ibmvscsi/ > F: drivers/tty/hvc/hvc_opal.c > F: drivers/watchdog/wdrtas.c > +F: include/linux/papr_scm.h > +F: include/uapi/linux/papr_pdsm.h > F: tools/testing/selftests/powerpc > N: /pmac > N: powermac > diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h > b/arch/powerpc/include/uapi/asm/papr_pdsm.h > deleted file mode 100644 > index 17439925045c.. > --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h > +++ /dev/null > @@ -1,165 +0,0 @@ > -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ > -/* > - * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl > - * > - * (C) Copyright IBM 2020 > - * > - * Author: Vaibhav Jain > - */ > - > -#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_ > -#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_ > - > -#include > -#include > - > -/* > - * PDSM Envelope: > - * > - * The ioctl ND_CMD_CALL exchange data between user-space and kernel via > - * envelope which consists of 2 headers sections and payload sections as > - * illustrated below: > - * +-+---+---+ > - * | 64-Bytes | 8-Bytes | Max 184-Bytes | > - * +-+---+---+ > - * | ND-HEADER | PDSM-HEADER | PDSM-PAYLOAD | > - * +-+---+---+ > - * | nd_family | | | > - * | nd_size_out | cmd_status| | > - * | nd_size_in | reserved | nd_pdsm_payload | > - * | nd_command | payload --> | | > - * | nd_fw_size | | | > - * | nd_payload ---> | | | > - * +---+-+---+ > - * > - * ND Header: > - * This is the generic libnvdimm header described as 'struct nd_cmd_pkg' > - * which is interpreted by libnvdimm before passed on to papr_scm. Important > - * member fields used are: > - * 'nd_family' : (In) NVDIMM_FAMILY_PAPR_SCM > - * 'nd_size_in' : (In) PDSM-HEADER + PDSM-IN-PAYLOAD (usually 0) > - * 'nd_size_out': (In) PDSM-HEADER + PDSM-RETURN-PAYLOAD > - * 'nd_command' : (In) One of PAPR_PDSM_XXX > - * 'nd_fw_size' : (Out) PDSM-HEADER + size of actual payload returned > - * > - * PDSM Header: > - * This is papr-scm specific header that precedes the payload. This is > defined > - * as nd_cmd_pdsm_pkg. Following fields aare available in this header: > - * > - * 'cmd_status'
Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time
Le 22/12/2023 à 06:35, Kees Cook a écrit : > [Vous ne recevez pas souvent de courriers de k...@kernel.org. Découvrez > pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ] > > On December 21, 2023 4:16:56 AM PST, Michael Ellerman > wrote: >> Cc +Kees >> >> Christophe Leroy writes: >>> Declaring rodata_enabled and mark_rodata_ro() at all time >>> helps removing related #ifdefery in C files. >>> >>> Signed-off-by: Christophe Leroy >>> --- >>> include/linux/init.h | 4 >>> init/main.c | 21 +++-- >>> 2 files changed, 7 insertions(+), 18 deletions(-) >>> >>> diff --git a/include/linux/init.h b/include/linux/init.h >>> index 01b52c9c7526..d2b47be38a07 100644 >>> --- a/include/linux/init.h >>> +++ b/include/linux/init.h >>> @@ -168,12 +168,8 @@ extern initcall_entry_t __initcall_end[]; >>> >>> extern struct file_system_type rootfs_fs_type; >>> >>> -#if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_STRICT_MODULE_RWX) >>> extern bool rodata_enabled; >>> -#endif >>> -#ifdef CONFIG_STRICT_KERNEL_RWX >>> void mark_rodata_ro(void); >>> -#endif >>> >>> extern void (*late_time_init)(void); >>> >>> diff --git a/init/main.c b/init/main.c >>> index e24b0780fdff..807df08c501f 100644 >>> --- a/init/main.c >>> +++ b/init/main.c >>> @@ -1396,10 +1396,9 @@ static int __init set_debug_rodata(char *str) >>> early_param("rodata", set_debug_rodata); >>> #endif >>> >>> -#ifdef CONFIG_STRICT_KERNEL_RWX >>> static void mark_readonly(void) >>> { >>> -if (rodata_enabled) { >>> +if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) && rodata_enabled) { > > I think this will break without rodata_enabled actual existing on other > architectures. (Only declaration was made visible, not the definition, which > is above here and still behind ifdefs?) The compiler constant-folds IS_ENABLED(CONFIG_STRICT_KERNEL_RWX). When it is false, the second part is dropped. Exemple: bool test(void) { if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) && rodata_enabled) return true; else return false; } With CONFIG_STRICT_KERNEL_RWX set, it directly returns the content of rodata_enabled: 0160 : 160: 3d 20 00 00 lis r9,0 162: R_PPC_ADDR16_HArodata_enabled 164: 88 69 00 00 lbz r3,0(r9) 166: R_PPC_ADDR16_LOrodata_enabled 168: 4e 80 00 20 blr With CONFIG_STRICT_KERNEL_RWX unset, it returns 0 and doesn't reference rodata_enabled at all: 00bc : bc: 38 60 00 00 li r3,0 c0: 4e 80 00 20 blr Many places in the kernel use this approach to minimise amount of #ifdefs. Christophe
[PATCH 3/3] powerpc: Simplify strict_kernel_rwx_enabled()
Now that rodata_enabled is always declared, remove #ifdef and define a single version of strict_kernel_rwx_enabled(). Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/mmu.h | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index d8b7e246a32f..24241995f740 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -330,17 +330,10 @@ static __always_inline bool early_radix_enabled(void) return early_mmu_has_feature(MMU_FTR_TYPE_RADIX); } -#ifdef CONFIG_STRICT_KERNEL_RWX static inline bool strict_kernel_rwx_enabled(void) { - return rodata_enabled; + return IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) && rodata_enabled; } -#else -static inline bool strict_kernel_rwx_enabled(void) -{ - return false; -} -#endif static inline bool strict_module_rwx_enabled(void) { -- 2.41.0
[PATCH 2/3] modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around rodata_enabled
Now that rodata_enabled is declared at all time, the #ifdef CONFIG_STRICT_MODULE_RWX can be removed. Signed-off-by: Christophe Leroy --- kernel/module/strict_rwx.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c index a2b656b4e3d2..eadff63b6e80 100644 --- a/kernel/module/strict_rwx.c +++ b/kernel/module/strict_rwx.c @@ -34,12 +34,8 @@ void module_enable_x(const struct module *mod) void module_enable_ro(const struct module *mod, bool after_init) { - if (!IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) - return; -#ifdef CONFIG_STRICT_MODULE_RWX - if (!rodata_enabled) + if (!IS_ENABLED(CONFIG_STRICT_MODULE_RWX) || !rodata_enabled) return; -#endif module_set_memory(mod, MOD_TEXT, set_memory_ro); module_set_memory(mod, MOD_INIT_TEXT, set_memory_ro); -- 2.41.0
[PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time
Declaring rodata_enabled and mark_rodata_ro() at all time helps removing related #ifdefery in C files. Signed-off-by: Christophe Leroy --- include/linux/init.h | 4 init/main.c | 21 +++-- 2 files changed, 7 insertions(+), 18 deletions(-) diff --git a/include/linux/init.h b/include/linux/init.h index 01b52c9c7526..d2b47be38a07 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -168,12 +168,8 @@ extern initcall_entry_t __initcall_end[]; extern struct file_system_type rootfs_fs_type; -#if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_STRICT_MODULE_RWX) extern bool rodata_enabled; -#endif -#ifdef CONFIG_STRICT_KERNEL_RWX void mark_rodata_ro(void); -#endif extern void (*late_time_init)(void); diff --git a/init/main.c b/init/main.c index e24b0780fdff..807df08c501f 100644 --- a/init/main.c +++ b/init/main.c @@ -1396,10 +1396,9 @@ static int __init set_debug_rodata(char *str) early_param("rodata", set_debug_rodata); #endif -#ifdef CONFIG_STRICT_KERNEL_RWX static void mark_readonly(void) { - if (rodata_enabled) { + if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) && rodata_enabled) { /* * load_module() results in W+X mappings, which are cleaned * up with call_rcu(). Let's make sure that queued work is @@ -1409,20 +1408,14 @@ static void mark_readonly(void) rcu_barrier(); mark_rodata_ro(); rodata_test(); - } else + } else if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) { pr_info("Kernel memory protection disabled.\n"); + } else if (IS_ENABLED(CONFIG_ARCH_HAS_STRICT_KERNEL_RWX)) { + pr_warn("Kernel memory protection not selected by kernel config.\n"); + } else { + pr_warn("This architecture does not have kernel memory protection.\n"); + } } -#elif defined(CONFIG_ARCH_HAS_STRICT_KERNEL_RWX) -static inline void mark_readonly(void) -{ - pr_warn("Kernel memory protection not selected by kernel config.\n"); -} -#else -static inline void mark_readonly(void) -{ - pr_warn("This architecture does not have kernel memory protection.\n"); -} -#endif void __weak free_initmem(void) { -- 2.41.0
[PATCH 3/3] module: Don't ignore errors from set_memory_XX()
set_memory_ro(), set_memory_nx(), set_memory_x() and other helps can fail an return an error. In that case the memory might not be protected as expected and the module loading has to be aborted to avoid security issues. Check return value of all calls to set_memory_XX() and handle error if any. Signed-off-by: Christophe Leroy --- kernel/module/internal.h | 6 ++--- kernel/module/main.c | 18 ++ kernel/module/strict_rwx.c | 48 ++ 3 files changed, 50 insertions(+), 22 deletions(-) diff --git a/kernel/module/internal.h b/kernel/module/internal.h index 4f1b98f011da..2ebece8a789f 100644 --- a/kernel/module/internal.h +++ b/kernel/module/internal.h @@ -322,9 +322,9 @@ static inline struct module *mod_find(unsigned long addr, struct mod_tree_root * } #endif /* CONFIG_MODULES_TREE_LOOKUP */ -void module_enable_rodata_ro(const struct module *mod, bool after_init); -void module_enable_data_nx(const struct module *mod); -void module_enable_text_rox(const struct module *mod); +int module_enable_rodata_ro(const struct module *mod, bool after_init); +int module_enable_data_nx(const struct module *mod); +int module_enable_text_rox(const struct module *mod); int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs, char *secstrings, struct module *mod); diff --git a/kernel/module/main.c b/kernel/module/main.c index 64662e55e275..cfe197455d64 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2568,7 +2568,9 @@ static noinline int do_init_module(struct module *mod) /* Switch to core kallsyms now init is done: kallsyms may be walking! */ rcu_assign_pointer(mod->kallsyms, >core_kallsyms); #endif - module_enable_rodata_ro(mod, true); + ret = module_enable_rodata_ro(mod, true); + if (ret) + goto fail_mutex_unlock; mod_tree_remove_init(mod); module_arch_freeing_init(mod); for_class_mod_mem_type(type, init) { @@ -2606,6 +2608,8 @@ static noinline int do_init_module(struct module *mod) return 0; +fail_mutex_unlock: + mutex_unlock(_mutex); fail_free_freeinit: kfree(freeinit); fail: @@ -2733,9 +2737,15 @@ static int complete_formation(struct module *mod, struct load_info *info) module_bug_finalize(info->hdr, info->sechdrs, mod); module_cfi_finalize(info->hdr, info->sechdrs, mod); - module_enable_rodata_ro(mod, false); - module_enable_data_nx(mod); - module_enable_text_rox(mod); + err = module_enable_rodata_ro(mod, false); + if (err) + goto out; + err = module_enable_data_nx(mod); + if (err) + goto out; + err = module_enable_text_rox(mod); + if (err) + goto out; /* * Mark state as coming so strong_try_module_get() ignores us, diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c index 9b2d58a8d59d..a14df9655dbe 100644 --- a/kernel/module/strict_rwx.c +++ b/kernel/module/strict_rwx.c @@ -11,13 +11,13 @@ #include #include "internal.h" -static void module_set_memory(const struct module *mod, enum mod_mem_type type, +static int module_set_memory(const struct module *mod, enum mod_mem_type type, int (*set_memory)(unsigned long start, int num_pages)) { const struct module_memory *mod_mem = >mem[type]; set_vm_flush_reset_perms(mod_mem->base); - set_memory((unsigned long)mod_mem->base, mod_mem->size >> PAGE_SHIFT); + return set_memory((unsigned long)mod_mem->base, mod_mem->size >> PAGE_SHIFT); } /* @@ -26,39 +26,57 @@ static void module_set_memory(const struct module *mod, enum mod_mem_type type, * CONFIG_STRICT_MODULE_RWX because they are needed regardless of whether we * are strict. */ -void module_enable_text_rox(const struct module *mod) +int module_enable_text_rox(const struct module *mod) { for_class_mod_mem_type(type, text) { + int ret; + if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) - module_set_memory(mod, type, set_memory_rox); + ret = module_set_memory(mod, type, set_memory_rox); else - module_set_memory(mod, type, set_memory_x); + ret = module_set_memory(mod, type, set_memory_x); + if (ret) + return ret; } + return 0; } -void module_enable_rodata_ro(const struct module *mod, bool after_init) +int module_enable_rodata_ro(const struct module *mod, bool after_init) { + int ret; + if (!IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) - return; + return 0; #ifdef CONFIG_STRICT_MODULE_RWX if (!rodata_enabled) - return; + return 0; #endif - module_set_memory(mod,
[PATCH 2/3] module: Change module_enable_{nx/x/ro}() to more explicit names
It's a bit puzzling to see a call to module_enable_nx() followed by a call to module_enable_x(). This is because one applies on text while the other applies on data. Change name to make that more clear. Signed-off-by: Christophe Leroy --- kernel/module/internal.h | 6 +++--- kernel/module/main.c | 8 kernel/module/strict_rwx.c | 6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/module/internal.h b/kernel/module/internal.h index a647ab17193d..4f1b98f011da 100644 --- a/kernel/module/internal.h +++ b/kernel/module/internal.h @@ -322,9 +322,9 @@ static inline struct module *mod_find(unsigned long addr, struct mod_tree_root * } #endif /* CONFIG_MODULES_TREE_LOOKUP */ -void module_enable_ro(const struct module *mod, bool after_init); -void module_enable_nx(const struct module *mod); -void module_enable_rox(const struct module *mod); +void module_enable_rodata_ro(const struct module *mod, bool after_init); +void module_enable_data_nx(const struct module *mod); +void module_enable_text_rox(const struct module *mod); int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs, char *secstrings, struct module *mod); diff --git a/kernel/module/main.c b/kernel/module/main.c index 1c8f328ca015..64662e55e275 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2568,7 +2568,7 @@ static noinline int do_init_module(struct module *mod) /* Switch to core kallsyms now init is done: kallsyms may be walking! */ rcu_assign_pointer(mod->kallsyms, >core_kallsyms); #endif - module_enable_ro(mod, true); + module_enable_rodata_ro(mod, true); mod_tree_remove_init(mod); module_arch_freeing_init(mod); for_class_mod_mem_type(type, init) { @@ -2733,9 +2733,9 @@ static int complete_formation(struct module *mod, struct load_info *info) module_bug_finalize(info->hdr, info->sechdrs, mod); module_cfi_finalize(info->hdr, info->sechdrs, mod); - module_enable_ro(mod, false); - module_enable_nx(mod); - module_enable_rox(mod); + module_enable_rodata_ro(mod, false); + module_enable_data_nx(mod); + module_enable_text_rox(mod); /* * Mark state as coming so strong_try_module_get() ignores us, diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c index 9345b09f28a5..9b2d58a8d59d 100644 --- a/kernel/module/strict_rwx.c +++ b/kernel/module/strict_rwx.c @@ -26,7 +26,7 @@ static void module_set_memory(const struct module *mod, enum mod_mem_type type, * CONFIG_STRICT_MODULE_RWX because they are needed regardless of whether we * are strict. */ -void module_enable_rox(const struct module *mod) +void module_enable_text_rox(const struct module *mod) { for_class_mod_mem_type(type, text) { if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) @@ -36,7 +36,7 @@ void module_enable_rox(const struct module *mod) } } -void module_enable_ro(const struct module *mod, bool after_init) +void module_enable_rodata_ro(const struct module *mod, bool after_init) { if (!IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) return; @@ -52,7 +52,7 @@ void module_enable_ro(const struct module *mod, bool after_init) module_set_memory(mod, MOD_RO_AFTER_INIT, set_memory_ro); } -void module_enable_nx(const struct module *mod) +void module_enable_data_nx(const struct module *mod) { if (!IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) return; -- 2.41.0
[PATCH 1/3] module: Use set_memory_rox()
A couple of architectures seem concerned about calling set_memory_ro() and set_memory_x() too frequently and have implemented a version of set_memory_rox(), see commit 60463628c9e0 ("x86/mm: Implement native set_memory_rox()") and commit 22e99fa56443 ("s390/mm: implement set_memory_rox()") Use set_memory_rox() in modules when STRICT_MODULES_RWX is set. Signed-off-by: Christophe Leroy --- kernel/module/internal.h | 2 +- kernel/module/main.c | 2 +- kernel/module/strict_rwx.c | 12 +++- 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/kernel/module/internal.h b/kernel/module/internal.h index c8b7b4dcf782..a647ab17193d 100644 --- a/kernel/module/internal.h +++ b/kernel/module/internal.h @@ -324,7 +324,7 @@ static inline struct module *mod_find(unsigned long addr, struct mod_tree_root * void module_enable_ro(const struct module *mod, bool after_init); void module_enable_nx(const struct module *mod); -void module_enable_x(const struct module *mod); +void module_enable_rox(const struct module *mod); int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs, char *secstrings, struct module *mod); diff --git a/kernel/module/main.c b/kernel/module/main.c index 98fedfdb8db5..1c8f328ca015 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2735,7 +2735,7 @@ static int complete_formation(struct module *mod, struct load_info *info) module_enable_ro(mod, false); module_enable_nx(mod); - module_enable_x(mod); + module_enable_rox(mod); /* * Mark state as coming so strong_try_module_get() ignores us, diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c index a2b656b4e3d2..9345b09f28a5 100644 --- a/kernel/module/strict_rwx.c +++ b/kernel/module/strict_rwx.c @@ -26,10 +26,14 @@ static void module_set_memory(const struct module *mod, enum mod_mem_type type, * CONFIG_STRICT_MODULE_RWX because they are needed regardless of whether we * are strict. */ -void module_enable_x(const struct module *mod) +void module_enable_rox(const struct module *mod) { - for_class_mod_mem_type(type, text) - module_set_memory(mod, type, set_memory_x); + for_class_mod_mem_type(type, text) { + if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) + module_set_memory(mod, type, set_memory_rox); + else + module_set_memory(mod, type, set_memory_x); + } } void module_enable_ro(const struct module *mod, bool after_init) @@ -41,8 +45,6 @@ void module_enable_ro(const struct module *mod, bool after_init) return; #endif - module_set_memory(mod, MOD_TEXT, set_memory_ro); - module_set_memory(mod, MOD_INIT_TEXT, set_memory_ro); module_set_memory(mod, MOD_RODATA, set_memory_ro); module_set_memory(mod, MOD_INIT_RODATA, set_memory_ro); -- 2.41.0
Re: [PATCH 12/27] tty: hvc: convert to u8 and size_t
Le 06/12/2023 à 08:36, Jiri Slaby (SUSE) a écrit : > Switch character types to u8 and sizes to size_t. To conform to > characters/sizes in the rest of the tty layer. > > Signed-off-by: Jiri Slaby (SUSE) > Cc: Michael Ellerman > Cc: Nicholas Piggin > Cc: Christophe Leroy > Cc: Amit Shah > Cc: Arnd Bergmann > Cc: Paul Walmsley > Cc: Palmer Dabbelt > Cc: Albert Ou > Cc: linuxppc-...@lists.ozlabs.org > Cc: virtualizat...@lists.linux.dev > Cc: linux-ri...@lists.infradead.org > --- > arch/powerpc/include/asm/hvconsole.h | 4 ++-- > arch/powerpc/include/asm/hvsi.h| 18 > arch/powerpc/include/asm/opal.h| 8 +--- > arch/powerpc/platforms/powernv/opal.c | 14 +++-- > arch/powerpc/platforms/pseries/hvconsole.c | 4 ++-- > drivers/char/virtio_console.c | 10 - > drivers/tty/hvc/hvc_console.h | 4 ++-- > drivers/tty/hvc/hvc_dcc.c | 24 +++--- > drivers/tty/hvc/hvc_iucv.c | 18 > drivers/tty/hvc/hvc_opal.c | 5 +++-- > drivers/tty/hvc/hvc_riscv_sbi.c| 9 > drivers/tty/hvc/hvc_rtas.c | 11 +- > drivers/tty/hvc/hvc_udbg.c | 9 > drivers/tty/hvc/hvc_vio.c | 18 > drivers/tty/hvc/hvc_xen.c | 23 +++-- > drivers/tty/hvc/hvsi_lib.c | 20 ++ > 16 files changed, 107 insertions(+), 92 deletions(-) > > diff --git a/arch/powerpc/include/asm/hvconsole.h > b/arch/powerpc/include/asm/hvconsole.h > index ccb2034506f0..d841a97010a0 100644 > --- a/arch/powerpc/include/asm/hvconsole.h > +++ b/arch/powerpc/include/asm/hvconsole.h > @@ -21,8 +21,8 @@ >* Vio firmware always attempts to fetch MAX_VIO_GET_CHARS chars. The > 'count' >* parm is included to conform to put_chars() function pointer template >*/ > -extern int hvc_get_chars(uint32_t vtermno, char *buf, int count); > -extern int hvc_put_chars(uint32_t vtermno, const char *buf, int count); > +extern ssize_t hvc_get_chars(uint32_t vtermno, u8 *buf, size_t count); > +extern ssize_t hvc_put_chars(uint32_t vtermno, const u8 *buf, size_t count); Would be a good opportunity to drop this pointless deprecated 'extern' keyword on all function prototypes you are changing. Christophe
Re: [PATCH v3 09/13] powerpc: extend execmem_params for kprobes allocations
Hi Mike, Le 18/09/2023 à 09:29, Mike Rapoport a écrit : > From: "Mike Rapoport (IBM)" > > powerpc overrides kprobes::alloc_insn_page() to remove writable > permissions when STRICT_MODULE_RWX is on. > > Add definition of EXECMEM_KRPOBES to execmem_params to allow using the > generic kprobes::alloc_insn_page() with the desired permissions. > > As powerpc uses breakpoint instructions to inject kprobes, it does not > need to constrain kprobe allocations to the modules area and can use the > entire vmalloc address space. I don't understand what you mean here. Does it mean kprobe allocation doesn't need to be executable ? I don't think so based on the pgprot you set. On powerpc book3s/32, vmalloc space is not executable. Only modules space is executable. X/NX cannot be set on a per page basis, it can only be set on a 256 Mbytes segment basis. See commit c49643319715 ("powerpc/32s: Only leave NX unset on segments used for modules") and 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX") and 7bee31ad8e2f ("powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined"). So if your intention is still to have an executable kprobes, then you can't use vmalloc address space. Christophe > > Signed-off-by: Mike Rapoport (IBM) > --- > arch/powerpc/kernel/kprobes.c | 14 -- > arch/powerpc/kernel/module.c | 11 +++ > 2 files changed, 11 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c > index 62228c7072a2..14c5ddec3056 100644 > --- a/arch/powerpc/kernel/kprobes.c > +++ b/arch/powerpc/kernel/kprobes.c > @@ -126,20 +126,6 @@ kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long > addr, unsigned long offse > return (kprobe_opcode_t *)(addr + offset); > } > > -void *alloc_insn_page(void) > -{ > - void *page; > - > - page = execmem_text_alloc(EXECMEM_KPROBES, PAGE_SIZE); > - if (!page) > - return NULL; > - > - if (strict_module_rwx_enabled()) > - set_memory_rox((unsigned long)page, 1); > - > - return page; > -} > - > int arch_prepare_kprobe(struct kprobe *p) > { > int ret = 0; > diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c > index 824d9541a310..bf2c62aef628 100644 > --- a/arch/powerpc/kernel/module.c > +++ b/arch/powerpc/kernel/module.c > @@ -95,6 +95,9 @@ static struct execmem_params execmem_params __ro_after_init > = { > [EXECMEM_DEFAULT] = { > .alignment = 1, > }, > + [EXECMEM_KPROBES] = { > + .alignment = 1, > + }, > [EXECMEM_MODULE_DATA] = { > .alignment = 1, > }, > @@ -135,5 +138,13 @@ struct execmem_params __init *execmem_arch_params(void) > > range->pgprot = prot; > > + execmem_params.ranges[EXECMEM_KPROBES].start = VMALLOC_START; > + execmem_params.ranges[EXECMEM_KPROBES].start = VMALLOC_END; > + > + if (strict_module_rwx_enabled()) > + execmem_params.ranges[EXECMEM_KPROBES].pgprot = PAGE_KERNEL_ROX; > + else > + execmem_params.ranges[EXECMEM_KPROBES].pgprot = > PAGE_KERNEL_EXEC; > + > return _params; > }
Re: [PATCH v3 06/13] mm/execmem: introduce execmem_data_alloc()
Le 22/09/2023 à 10:55, Song Liu a écrit : > On Fri, Sep 22, 2023 at 12:17 AM Christophe Leroy > wrote: >> >> >> >> Le 22/09/2023 à 00:52, Song Liu a écrit : >>> On Mon, Sep 18, 2023 at 12:31 AM Mike Rapoport wrote: >>>> >>> [...] >>>> diff --git a/include/linux/execmem.h b/include/linux/execmem.h >>>> index 519bdfdca595..09d45ac786e9 100644 >>>> --- a/include/linux/execmem.h >>>> +++ b/include/linux/execmem.h >>>> @@ -29,6 +29,7 @@ >>>> * @EXECMEM_KPROBES: parameters for kprobes >>>> * @EXECMEM_FTRACE: parameters for ftrace >>>> * @EXECMEM_BPF: parameters for BPF >>>> + * @EXECMEM_MODULE_DATA: parameters for module data sections >>>> * @EXECMEM_TYPE_MAX: >>>> */ >>>>enum execmem_type { >>>> @@ -37,6 +38,7 @@ enum execmem_type { >>>> EXECMEM_KPROBES, >>>> EXECMEM_FTRACE, >>> >>> In longer term, I think we can improve the JITed code and merge >>> kprobe/ftrace/bpf. to use the same ranges. Also, do we need special >>> setting for FTRACE? If not, let's just remove it. >> >> How can we do that ? Some platforms like powerpc require executable >> memory for BPF and non-exec mem for KPROBE so it can't be in the same >> area/ranges. > > Hmm... non-exec mem for kprobes? > > if (strict_module_rwx_enabled()) > execmem_params.ranges[EXECMEM_KPROBES].pgprot = > PAGE_KERNEL_ROX; > else > execmem_params.ranges[EXECMEM_KPROBES].pgprot = > PAGE_KERNEL_EXEC; > > Do you mean the latter case? > In fact I may have misunderstood patch 9. I'll provide a response there. Christophe
Re: [PATCH v3 06/13] mm/execmem: introduce execmem_data_alloc()
Le 22/09/2023 à 00:52, Song Liu a écrit : > On Mon, Sep 18, 2023 at 12:31 AM Mike Rapoport wrote: >> > [...] >> diff --git a/include/linux/execmem.h b/include/linux/execmem.h >> index 519bdfdca595..09d45ac786e9 100644 >> --- a/include/linux/execmem.h >> +++ b/include/linux/execmem.h >> @@ -29,6 +29,7 @@ >>* @EXECMEM_KPROBES: parameters for kprobes >>* @EXECMEM_FTRACE: parameters for ftrace >>* @EXECMEM_BPF: parameters for BPF >> + * @EXECMEM_MODULE_DATA: parameters for module data sections >>* @EXECMEM_TYPE_MAX: >>*/ >> enum execmem_type { >> @@ -37,6 +38,7 @@ enum execmem_type { >> EXECMEM_KPROBES, >> EXECMEM_FTRACE, > > In longer term, I think we can improve the JITed code and merge > kprobe/ftrace/bpf. to use the same ranges. Also, do we need special > setting for FTRACE? If not, let's just remove it. How can we do that ? Some platforms like powerpc require executable memory for BPF and non-exec mem for KPROBE so it can't be in the same area/ranges. > >> EXECMEM_BPF, >> + EXECMEM_MODULE_DATA, >> EXECMEM_TYPE_MAX, >> }; > > Overall, it is great that kprobe/ftrace/bpf no longer depend on modules. > > OTOH, I think we should merge execmem_type and existing mod_mem_type. > Otherwise, we still need to handle page permissions in multiple places. > What is our plan for that? > Christophe
Re: [PATCH v4 19/20] mips: Convert to GENERIC_CMDLINE
Le 09/04/2021 à 03:23, Daniel Walker a écrit : On Thu, Apr 08, 2021 at 02:04:08PM -0500, Rob Herring wrote: On Tue, Apr 06, 2021 at 10:38:36AM -0700, Daniel Walker wrote: On Fri, Apr 02, 2021 at 03:18:21PM +, Christophe Leroy wrote: -config CMDLINE_BOOL - bool "Built-in kernel command line" - help - For most systems, it is firmware or second stage bootloader that - by default specifies the kernel command line options. However, - it might be necessary or advantageous to either override the - default kernel command line or add a few extra options to it. - For such cases, this option allows you to hardcode your own - command line options directly into the kernel. For that, you - should choose 'Y' here, and fill in the extra boot arguments - in CONFIG_CMDLINE. - - The built-in options will be concatenated to the default command - line if CMDLINE_OVERRIDE is set to 'N'. Otherwise, the default - command line will be ignored and replaced by the built-in string. - - Most MIPS systems will normally expect 'N' here and rely upon - the command line from the firmware or the second-stage bootloader. - See how you complained that I have CMDLINE_BOOL in my changed, and you think it shouldn't exist. Yet here mips has it, and you just deleted it with no feature parity in your changes for this. AFAICT, CMDLINE_BOOL equates to a non-empty or empty CONFIG_CMDLINE. You seem to need it just because you have CMDLINE_PREPEND and CMDLINE_APPEND. If that's not it, what feature is missing? CMDLINE_BOOL is not a feature, but an implementation detail. Not true. It makes it easier to turn it all off inside the Kconfig , so it's for usability and multiple architecture have it even with just CMDLINE as I was commenting here. Among the 13 architectures having CONFIG_CMDLINE, todayb only 6 have a CONFIG_CMDLINE_BOOL in addition: arch/arm/Kconfig:config CMDLINE arch/arm64/Kconfig:config CMDLINE arch/hexagon/Kconfig:config CMDLINE arch/microblaze/Kconfig:config CMDLINE arch/mips/Kconfig.debug:config CMDLINE arch/nios2/Kconfig:config CMDLINE arch/openrisc/Kconfig:config CMDLINE arch/powerpc/Kconfig:config CMDLINE arch/riscv/Kconfig:config CMDLINE arch/sh/Kconfig:config CMDLINE arch/sparc/Kconfig:config CMDLINE arch/x86/Kconfig:config CMDLINE arch/xtensa/Kconfig:config CMDLINE arch/microblaze/Kconfig:config CMDLINE_BOOL arch/mips/Kconfig.debug:config CMDLINE_BOOL arch/nios2/Kconfig:config CMDLINE_BOOL arch/sparc/Kconfig:config CMDLINE_BOOL arch/x86/Kconfig:config CMDLINE_BOOL arch/xtensa/Kconfig:config CMDLINE_BOOL In the begining I hesitated about the CMDLINE_BOOL, at the end I decided to go the same way as what is done today in the kernel for initramfs with CONFIG_INITRAMFS_SOURCE. The problem I see within adding CONFIG_CMDLINE_BOOL for every architecture which don't have it today is that when doing a "make oldconfig" on their custom configs, thousands of users will loose their CMDLINE without notice. When we do the other way round, removing CONFIG_CMDLINE_BOOL on the 6 architectures that have it today will have no impact on existing config. Also, in order to avoid tons of #ifdefs in the code as mandated by Kernel Codying Style §21, we have to have CONFIG_CMDLINE defined at all time, so at the end CONFIG_CMDLINE_BOOL is really redundant with an empty CONFIG_CMDLINE. Unlike you, the approach I took for my series is to minimise the impact on existing implementation and existing configurations as much as possible. I know you have a different approach where you break every existing config anyway. https://www.kernel.org/doc/html/latest/process/coding-style.html#conditional-compilation Christophe
[PATCH v2 1/2] powerpc/inst: ppc_inst_as_u64() becomes ppc_inst_as_ulong()
In order to simplify use on PPC32, change ppc_inst_as_u64() into ppc_inst_as_ulong() that returns the 32 bits instruction on PPC32. Will be used when porting OPTPROBES to PPC32. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/inst.h | 13 +++-- arch/powerpc/kernel/optprobes.c | 2 +- arch/powerpc/lib/code-patching.c | 2 +- arch/powerpc/xmon/xmon.c | 2 +- 4 files changed, 10 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index 19e18af2fac9..9646c63f7420 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -147,13 +147,14 @@ static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst *va return location + ppc_inst_len(tmp); } -static inline u64 ppc_inst_as_u64(struct ppc_inst x) +static inline unsigned long ppc_inst_as_ulong(struct ppc_inst x) { -#ifdef CONFIG_CPU_LITTLE_ENDIAN - return (u64)ppc_inst_suffix(x) << 32 | ppc_inst_val(x); -#else - return (u64)ppc_inst_val(x) << 32 | ppc_inst_suffix(x); -#endif + if (IS_ENABLED(CONFIG_PPC32)) + return ppc_inst_val(x); + else if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN)) + return (u64)ppc_inst_suffix(x) << 32 | ppc_inst_val(x); + else + return (u64)ppc_inst_val(x) << 32 | ppc_inst_suffix(x); } #define PPC_INST_STR_LEN sizeof(" ") diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c index 7f7cdbeacd1a..58fdb9f66e0f 100644 --- a/arch/powerpc/kernel/optprobes.c +++ b/arch/powerpc/kernel/optprobes.c @@ -264,7 +264,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) * 3. load instruction to be emulated into relevant register, and */ temp = ppc_inst_read((struct ppc_inst *)p->ainsn.insn); - patch_imm64_load_insns(ppc_inst_as_u64(temp), 4, buff + TMPL_INSN_IDX); + patch_imm64_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX); /* * 4. branch back from trampoline diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 65aec4d6d9ba..870b30d9be2f 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -26,7 +26,7 @@ static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst instr __put_kernel_nofault(patch_addr, , u32, failed); } else { - u64 val = ppc_inst_as_u64(instr); + u64 val = ppc_inst_as_ulong(instr); __put_kernel_nofault(patch_addr, , u64, failed); } diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index a619b9ed8458..ff2b92bfeedc 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2953,7 +2953,7 @@ generic_inst_dump(unsigned long adr, long count, int praddr, if (!ppc_inst_prefixed(inst)) dump_func(ppc_inst_val(inst), adr); else - dump_func(ppc_inst_as_u64(inst), adr); + dump_func(ppc_inst_as_ulong(inst), adr); printf("\n"); } return adr - first_adr; -- 2.25.0
[PATCH v2 2/2] powerpc: Enable OPTPROBES on PPC32
For that, create a 32 bits version of patch_imm64_load_insns() and create a patch_imm_load_insns() which calls patch_imm32_load_insns() on PPC32 and patch_imm64_load_insns() on PPC64. Adapt optprobes_head.S for PPC32. Use PPC_LL/PPC_STL macros instead of raw ld/std, opt out things linked to paca and use stmw/lmw to save/restore registers. Signed-off-by: Christophe Leroy --- v2: Comments from Naveen. --- arch/powerpc/Kconfig | 2 +- arch/powerpc/kernel/optprobes.c | 24 -- arch/powerpc/kernel/optprobes_head.S | 65 +++- 3 files changed, 56 insertions(+), 35 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 475d77a6ebbe..d2e31a578e26 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -229,7 +229,7 @@ config PPC select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI if PERF_EVENTS || (PPC64 && PPC_BOOK3S) select HAVE_HARDLOCKUP_DETECTOR_ARCHif PPC64 && PPC_BOOK3S && SMP - select HAVE_OPTPROBES if PPC64 + select HAVE_OPTPROBES select HAVE_PERF_EVENTS select HAVE_PERF_EVENTS_NMI if PPC64 select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c index 58fdb9f66e0f..cdf87086fa33 100644 --- a/arch/powerpc/kernel/optprobes.c +++ b/arch/powerpc/kernel/optprobes.c @@ -141,11 +141,21 @@ void arch_remove_optimized_kprobe(struct optimized_kprobe *op) } } +static void patch_imm32_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) +{ + patch_instruction((struct ppc_inst *)addr, + ppc_inst(PPC_RAW_LIS(reg, IMM_H(val; + addr++; + + patch_instruction((struct ppc_inst *)addr, + ppc_inst(PPC_RAW_ORI(reg, reg, IMM_L(val; +} + /* * Generate instructions to load provided immediate 64-bit value * to register 'reg' and patch these instructions at 'addr'. */ -static void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) +static void patch_imm64_load_insns(unsigned long long val, int reg, kprobe_opcode_t *addr) { /* lis reg,(op)@highest */ patch_instruction((struct ppc_inst *)addr, @@ -177,6 +187,14 @@ static void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t * ___PPC_RS(reg) | (val & 0x))); } +static void patch_imm_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) +{ + if (IS_ENABLED(CONFIG_PPC64)) + patch_imm64_load_insns(val, reg, addr); + else + patch_imm32_load_insns(val, reg, addr); +} + int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) { struct ppc_inst branch_op_callback, branch_emulate_step, temp; @@ -230,7 +248,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) * Fixup the template with instructions to: * 1. load the address of the actual probepoint */ - patch_imm64_load_insns((unsigned long)op, 3, buff + TMPL_OP_IDX); + patch_imm_load_insns((unsigned long)op, 3, buff + TMPL_OP_IDX); /* * 2. branch to optimized_callback() and emulate_step() @@ -264,7 +282,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) * 3. load instruction to be emulated into relevant register, and */ temp = ppc_inst_read((struct ppc_inst *)p->ainsn.insn); - patch_imm64_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX); + patch_imm_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX); /* * 4. branch back from trampoline diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S index ff8ba4d3824d..19ea3312403c 100644 --- a/arch/powerpc/kernel/optprobes_head.S +++ b/arch/powerpc/kernel/optprobes_head.S @@ -9,6 +9,16 @@ #include #include +#ifdef CONFIG_PPC64 +#define SAVE_30GPRS(base) SAVE_10GPRS(2,base); SAVE_10GPRS(12,base); SAVE_10GPRS(22,base) +#define REST_30GPRS(base) REST_10GPRS(2,base); REST_10GPRS(12,base); REST_10GPRS(22,base) +#define TEMPLATE_FOR_IMM_LOAD_INSNSnop; nop; nop; nop; nop +#else +#define SAVE_30GPRS(base) stmw r2, GPR2(base) +#define REST_30GPRS(base) lmw r2, GPR2(base) +#define TEMPLATE_FOR_IMM_LOAD_INSNSnop; nop; nop +#endif + #defineOPT_SLOT_SIZE 65536 .balign 4 @@ -30,39 +40,41 @@ optinsn_slot: .global optprobe_template_entry optprobe_template_entry: /* Create an in-memory pt_regs */ - stdur1,-INT_FRAME_SIZE(r1) + PPC_STLUr1,-INT_FRAME_SIZE(r1) SAVE_GPR(0,r1) /* Save the previous SP into stack */ addi
Re: [PATCH v2 2/2] powerpc/legacy_serial: Use early_ioremap()
Le 20/04/2021 à 15:32, Christophe Leroy a écrit : From: Christophe Leroy Oops, I forgot to reset the Author. Michael if you apply this patch please update the author and remove the old Signed-off-by Thanks [0.00] ioremap() called early from find_legacy_serial_ports+0x3cc/0x474. Use early_ioremap() instead find_legacy_serial_ports() is called early from setup_arch(), before paging_init(). vmalloc is not available yet, ioremap shouldn't be used that early. Use early_ioremap() and switch to a regular ioremap() later. Signed-off-by: Christophe Leroy Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/legacy_serial.c | 33 + 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/legacy_serial.c b/arch/powerpc/kernel/legacy_serial.c index f061e06e9f51..8b2c1a8553a0 100644 --- a/arch/powerpc/kernel/legacy_serial.c +++ b/arch/powerpc/kernel/legacy_serial.c @@ -15,6 +15,7 @@ #include #include #include +#include #undef DEBUG @@ -34,6 +35,7 @@ static struct legacy_serial_info { unsigned intclock; int irq_check_parent; phys_addr_t taddr; + void __iomem*early_addr; } legacy_serial_infos[MAX_LEGACY_SERIAL_PORTS]; static const struct of_device_id legacy_serial_parents[] __initconst = { @@ -325,17 +327,16 @@ static void __init setup_legacy_serial_console(int console) { struct legacy_serial_info *info = _serial_infos[console]; struct plat_serial8250_port *port = _serial_ports[console]; - void __iomem *addr; unsigned int stride; stride = 1 << port->regshift; /* Check if a translated MMIO address has been found */ if (info->taddr) { - addr = ioremap(info->taddr, 0x1000); - if (addr == NULL) + info->early_addr = early_ioremap(info->taddr, 0x1000); + if (info->early_addr == NULL) return; - udbg_uart_init_mmio(addr, stride); + udbg_uart_init_mmio(info->early_addr, stride); } else { /* Check if it's PIO and we support untranslated PIO */ if (port->iotype == UPIO_PORT && isa_io_special) @@ -353,6 +354,30 @@ static void __init setup_legacy_serial_console(int console) udbg_uart_setup(info->speed, info->clock); } +static int __init ioremap_legacy_serial_console(void) +{ + struct legacy_serial_info *info = _serial_infos[legacy_serial_console]; + struct plat_serial8250_port *port = _serial_ports[legacy_serial_console]; + void __iomem *vaddr; + + if (legacy_serial_console < 0) + return 0; + + if (!info->early_addr) + return 0; + + vaddr = ioremap(info->taddr, 0x1000); + if (WARN_ON(!vaddr)) + return -ENOMEM; + + udbg_uart_init_mmio(vaddr, 1 << port->regshift); + early_iounmap(info->early_addr, 0x1000); + info->early_addr = NULL; + + return 0; +} +early_initcall(ioremap_legacy_serial_console); + /* * This is called very early, as part of setup_system() or eventually * setup_arch(), basically before anything else in this file. This function
Re: [PATCH v1 2/2] powerpc: Enable OPTPROBES on PPC32
Le 20/04/2021 à 08:51, Naveen N. Rao a écrit : Christophe Leroy wrote: For that, create a 32 bits version of patch_imm64_load_insns() and create a patch_imm_load_insns() which calls patch_imm32_load_insns() on PPC32 and patch_imm64_load_insns() on PPC64. Adapt optprobes_head.S for PPC32. Use PPC_LL/PPC_STL macros instead of raw ld/std, opt out things linked to paca and use stmw/lmw to save/restore registers. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 +- arch/powerpc/kernel/optprobes.c | 24 +-- arch/powerpc/kernel/optprobes_head.S | 46 +++- 3 files changed, 53 insertions(+), 19 deletions(-) Thanks for adding support for ppc32. It is good to see that it works without too many changes. diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c1344c05226c..49b538e54efb 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -227,7 +227,7 @@ config PPC select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI if PERF_EVENTS || (PPC64 && PPC_BOOK3S) select HAVE_HARDLOCKUP_DETECTOR_ARCH if PPC64 && PPC_BOOK3S && SMP - select HAVE_OPTPROBES if PPC64 + select HAVE_OPTPROBES select HAVE_PERF_EVENTS select HAVE_PERF_EVENTS_NMI if PPC64 select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c index 58fdb9f66e0f..cdf87086fa33 100644 --- a/arch/powerpc/kernel/optprobes.c +++ b/arch/powerpc/kernel/optprobes.c @@ -141,11 +141,21 @@ void arch_remove_optimized_kprobe(struct optimized_kprobe *op) } } +static void patch_imm32_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) +{ + patch_instruction((struct ppc_inst *)addr, + ppc_inst(PPC_RAW_LIS(reg, IMM_H(val; + addr++; + + patch_instruction((struct ppc_inst *)addr, + ppc_inst(PPC_RAW_ORI(reg, reg, IMM_L(val; +} + /* * Generate instructions to load provided immediate 64-bit value * to register 'reg' and patch these instructions at 'addr'. */ -static void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) +static void patch_imm64_load_insns(unsigned long long val, int reg, kprobe_opcode_t *addr) Do you really need this? Without it I get: from arch/powerpc/kernel/optprobes.c:8: arch/powerpc/kernel/optprobes.c: In function 'patch_imm64_load_insns': arch/powerpc/kernel/optprobes.c:163:14: error: right shift count >= width of type [-Werror=shift-count-overflow] 163 |((val >> 48) & 0x))); | ^~ ./arch/powerpc/include/asm/inst.h:69:48: note: in definition of macro 'ppc_inst' 69 | #define ppc_inst(x) ((struct ppc_inst){ .val = x }) |^ arch/powerpc/kernel/optprobes.c:169:31: error: right shift count >= width of type [-Werror=shift-count-overflow] 169 |___PPC_RS(reg) | ((val >> 32) & 0x))); | ^~ ./arch/powerpc/include/asm/inst.h:69:48: note: in definition of macro 'ppc_inst' 69 | #define ppc_inst(x) ((struct ppc_inst){ .val = x }) |^ { /* lis reg,(op)@highest */ patch_instruction((struct ppc_inst *)addr, @@ -177,6 +187,14 @@ static void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t * ___PPC_RS(reg) | (val & 0x))); } +static void patch_imm_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) +{ + if (IS_ENABLED(CONFIG_PPC64)) + patch_imm64_load_insns(val, reg, addr); + else + patch_imm32_load_insns(val, reg, addr); +} + int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) { struct ppc_inst branch_op_callback, branch_emulate_step, temp; @@ -230,7 +248,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) * Fixup the template with instructions to: * 1. load the address of the actual probepoint */ - patch_imm64_load_insns((unsigned long)op, 3, buff + TMPL_OP_IDX); + patch_imm_load_insns((unsigned long)op, 3, buff + TMPL_OP_IDX); /* * 2. branch to optimized_callback() and emulate_step() @@ -264,7 +282,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) * 3. load instruction to be emulated into relevant register, and */ temp = ppc_inst_read((struct ppc_inst *)p->ainsn.insn); - patch_imm64_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX); + patch_imm_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX); /* * 4. branch back from trampoline diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes
[PATCH v2 2/2] powerpc/legacy_serial: Use early_ioremap()
From: Christophe Leroy [0.00] ioremap() called early from find_legacy_serial_ports+0x3cc/0x474. Use early_ioremap() instead find_legacy_serial_ports() is called early from setup_arch(), before paging_init(). vmalloc is not available yet, ioremap shouldn't be used that early. Use early_ioremap() and switch to a regular ioremap() later. Signed-off-by: Christophe Leroy Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/legacy_serial.c | 33 + 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/legacy_serial.c b/arch/powerpc/kernel/legacy_serial.c index f061e06e9f51..8b2c1a8553a0 100644 --- a/arch/powerpc/kernel/legacy_serial.c +++ b/arch/powerpc/kernel/legacy_serial.c @@ -15,6 +15,7 @@ #include #include #include +#include #undef DEBUG @@ -34,6 +35,7 @@ static struct legacy_serial_info { unsigned intclock; int irq_check_parent; phys_addr_t taddr; + void __iomem*early_addr; } legacy_serial_infos[MAX_LEGACY_SERIAL_PORTS]; static const struct of_device_id legacy_serial_parents[] __initconst = { @@ -325,17 +327,16 @@ static void __init setup_legacy_serial_console(int console) { struct legacy_serial_info *info = _serial_infos[console]; struct plat_serial8250_port *port = _serial_ports[console]; - void __iomem *addr; unsigned int stride; stride = 1 << port->regshift; /* Check if a translated MMIO address has been found */ if (info->taddr) { - addr = ioremap(info->taddr, 0x1000); - if (addr == NULL) + info->early_addr = early_ioremap(info->taddr, 0x1000); + if (info->early_addr == NULL) return; - udbg_uart_init_mmio(addr, stride); + udbg_uart_init_mmio(info->early_addr, stride); } else { /* Check if it's PIO and we support untranslated PIO */ if (port->iotype == UPIO_PORT && isa_io_special) @@ -353,6 +354,30 @@ static void __init setup_legacy_serial_console(int console) udbg_uart_setup(info->speed, info->clock); } +static int __init ioremap_legacy_serial_console(void) +{ + struct legacy_serial_info *info = _serial_infos[legacy_serial_console]; + struct plat_serial8250_port *port = _serial_ports[legacy_serial_console]; + void __iomem *vaddr; + + if (legacy_serial_console < 0) + return 0; + + if (!info->early_addr) + return 0; + + vaddr = ioremap(info->taddr, 0x1000); + if (WARN_ON(!vaddr)) + return -ENOMEM; + + udbg_uart_init_mmio(vaddr, 1 << port->regshift); + early_iounmap(info->early_addr, 0x1000); + info->early_addr = NULL; + + return 0; +} +early_initcall(ioremap_legacy_serial_console); + /* * This is called very early, as part of setup_system() or eventually * setup_arch(), basically before anything else in this file. This function -- 2.25.0
[PATCH v2 1/2] powerpc/64: Fix the definition of the fixmap area
At the time being, the fixmap area is defined at the top of the address space or just below KASAN. This definition is not valid for PPC64. For PPC64, use the top of the I/O space. Because of circular dependencies, it is not possible to include asm/fixmap.h in asm/book3s/64/pgtable.h , so define a fixed size AREA at the top of the I/O space for fixmap and ensure during build that the size is big enough. Fixes: 265c3491c4bc ("powerpc: Add support for GENERIC_EARLY_IOREMAP") Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/64/pgtable.h | 4 +++- arch/powerpc/include/asm/fixmap.h| 9 + arch/powerpc/include/asm/nohash/64/pgtable.h | 5 - 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 0c89977ec10b..a666d561b44d 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -7,6 +7,7 @@ #ifndef __ASSEMBLY__ #include #include +#include #endif /* @@ -324,7 +325,8 @@ extern unsigned long pci_io_base; #define PHB_IO_END(KERN_IO_START + FULL_IO_SIZE) #define IOREMAP_BASE (PHB_IO_END) #define IOREMAP_START (ioremap_bot) -#define IOREMAP_END(KERN_IO_END) +#define IOREMAP_END(KERN_IO_END - FIXADDR_SIZE) +#define FIXADDR_SIZE SZ_32M /* Advertise special mapping type for AGP */ #define HAVE_PAGE_AGP diff --git a/arch/powerpc/include/asm/fixmap.h b/arch/powerpc/include/asm/fixmap.h index 8d03c16a3663..947b5b9c4424 100644 --- a/arch/powerpc/include/asm/fixmap.h +++ b/arch/powerpc/include/asm/fixmap.h @@ -23,12 +23,17 @@ #include #endif +#ifdef CONFIG_PPC64 +#define FIXADDR_TOP(IOREMAP_END + FIXADDR_SIZE) +#else +#define FIXADDR_SIZE 0 #ifdef CONFIG_KASAN #include #define FIXADDR_TOP(KASAN_SHADOW_START - PAGE_SIZE) #else #define FIXADDR_TOP((unsigned long)(-PAGE_SIZE)) #endif +#endif /* * Here we define all the compile-time 'special' virtual @@ -50,6 +55,7 @@ */ enum fixed_addresses { FIX_HOLE, +#ifdef CONFIG_PPC32 /* reserve the top 128K for early debugging purposes */ FIX_EARLY_DEBUG_TOP = FIX_HOLE, FIX_EARLY_DEBUG_BASE = FIX_EARLY_DEBUG_TOP+(ALIGN(SZ_128K, PAGE_SIZE)/PAGE_SIZE)-1, @@ -72,6 +78,7 @@ enum fixed_addresses { FIX_IMMR_SIZE, #endif /* FIX_PCIE_MCFG, */ +#endif /* CONFIG_PPC32 */ __end_of_permanent_fixed_addresses, #define NR_FIX_BTMAPS (SZ_256K / PAGE_SIZE) @@ -98,6 +105,8 @@ enum fixed_addresses { static inline void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags) { + BUILD_BUG_ON(IS_ENABLED(CONFIG_PPC64) && __FIXADDR_SIZE > FIXADDR_SIZE); + if (__builtin_constant_p(idx)) BUILD_BUG_ON(idx >= __end_of_fixed_addresses); else if (WARN_ON(idx >= __end_of_fixed_addresses)) diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h index 6cb8aa357191..57cd3892bfe0 100644 --- a/arch/powerpc/include/asm/nohash/64/pgtable.h +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h @@ -6,6 +6,8 @@ * the ppc64 non-hashed page table. */ +#include + #include #include #include @@ -54,7 +56,8 @@ #define PHB_IO_END(KERN_IO_START + FULL_IO_SIZE) #define IOREMAP_BASE (PHB_IO_END) #define IOREMAP_START (ioremap_bot) -#define IOREMAP_END(KERN_VIRT_START + KERN_VIRT_SIZE) +#define IOREMAP_END(KERN_VIRT_START + KERN_VIRT_SIZE - FIXADDR_SIZE) +#define FIXADDR_SIZE SZ_32M /* -- 2.25.0
Re: [PATCH] powerpc/legacy_serial: Use early_ioremap()
Hi Chris, Le 10/08/2020 à 04:01, Chris Packham a écrit : On 24/03/20 10:54 am, Chris Packham wrote: Hi Christophe, On Wed, 2020-02-05 at 12:03 +, Christophe Leroy wrote: [0.00] ioremap() called early from find_legacy_serial_ports+0x3cc/0x474. Use early_ioremap() instead I was just about to dig into this error message and found you patch. I applied it to a v5.5 base. find_legacy_serial_ports() is called early from setup_arch(), before paging_init(). vmalloc is not available yet, ioremap shouldn't be used that early. Use early_ioremap() and switch to a regular ioremap() later. Signed-off-by: Christophe Leroy On my system (Freescale T2080 SOC) this seems to cause a crash/hang in early boot. Unfortunately because this is affecting the boot console I don't get any earlyprintk output. I've been doing a bit more digging into why Christophe's patch didn't work for me. I noticed the powerpc specific early_ioremap_range() returns addresses around ioremap_bot. Yet the generic early_ioremap() uses addresses around FIXADDR_TOP. If I try the following hack I can make Christophe's patch work diff --git a/arch/powerpc/include/asm/fixmap.h b/arch/powerpc/include/asm/fixmap.h index 2ef155a3c821..7bc2f3f73c8b 100644 --- a/arch/powerpc/include/asm/fixmap.h +++ b/arch/powerpc/include/asm/fixmap.h @@ -27,7 +27,7 @@ #include #define FIXADDR_TOP (KASAN_SHADOW_START - PAGE_SIZE) #else -#define FIXADDR_TOP ((unsigned long)(-PAGE_SIZE)) +#define FIXADDR_TOP (IOREMAP_END - PAGE_SIZE) #endif /* I'll admit to being out of my depth. It seems that the generic early_ioremap() is not quite correctly plumbed in for powerpc. Yes that's probably true for PPC64. I see that on PPC32 I had to implement the following changes in order to enable earlier use of early_ioremap() https://github.com/torvalds/linux/commit/925ac141d106b55acbe112a9272f970631a3c082 I have the problem with QEMU with the ppce500 machine. It will allow me to investigate it a bit further.
Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr
Le 19/04/2021 à 23:39, Randy Dunlap a écrit : On 4/19/21 6:16 AM, Michael Ellerman wrote: Randy Dunlap writes: Sure. I'll post them later today. They keep FPU and ALTIVEC as independent (build) features. Those patches look OK. But I don't think it makes sense to support that configuration, FPU=n ALTVEC=y. No one is ever going to make a CPU like that. We have enough testing surface due to configuration options, without adding artificial combinations that no one is ever going to use. IMHO :) So I'd rather we just make ALTIVEC depend on FPU. That's rather simple. See below. I'm doing a bunch of randconfig builds with it now. --- From: Randy Dunlap Subject: [PATCH] powerpc: make ALTIVEC depend PPC_FPU On a kernel config with ALTIVEC=y and PPC_FPU not set/enabled, there are build errors: drivers/cpufreq/pmac32-cpufreq.c:262:2: error: implicit declaration of function 'enable_kernel_fp' [-Werror,-Wimplicit-function-declaration] enable_kernel_fp(); ../arch/powerpc/lib/sstep.c: In function 'do_vec_load': ../arch/powerpc/lib/sstep.c:637:3: error: implicit declaration of function 'put_vr' [-Werror=implicit-function-declaration] 637 | put_vr(rn, ); | ^~ ../arch/powerpc/lib/sstep.c: In function 'do_vec_store': ../arch/powerpc/lib/sstep.c:660:3: error: implicit declaration of function 'get_vr'; did you mean 'get_oc'? [-Werror=implicit-function-declaration] 660 | get_vr(rn, ); | ^~ In theory ALTIVEC is independent of PPC_FPU but in practice nobody is going to build such a machine, so make ALTIVEC require PPC_FPU by depending on PPC_FPU. Signed-off-by: Randy Dunlap Reported-by: kernel test robot Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Cc: Christophe Leroy Cc: Segher Boessenkool Cc: l...@intel.com --- arch/powerpc/platforms/86xx/Kconfig|1 + arch/powerpc/platforms/Kconfig.cputype |2 ++ 2 files changed, 3 insertions(+) --- linux-next-20210416.orig/arch/powerpc/platforms/86xx/Kconfig +++ linux-next-20210416/arch/powerpc/platforms/86xx/Kconfig @@ -4,6 +4,7 @@ menuconfig PPC_86xx bool "86xx-based boards" depends on PPC_BOOK3S_32 select FSL_SOC + select PPC_FPU select ALTIVEC help The Freescale E600 SoCs have 74xx cores. --- linux-next-20210416.orig/arch/powerpc/platforms/Kconfig.cputype +++ linux-next-20210416/arch/powerpc/platforms/Kconfig.cputype @@ -186,6 +186,7 @@ config E300C3_CPU config G4_CPU bool "G4 (74xx)" depends on PPC_BOOK3S_32 + select PPC_FPU select ALTIVEC endchoice @@ -309,6 +310,7 @@ config PHYS_64BIT config ALTIVEC bool "AltiVec Support" + depends on PPC_FPU Shouldn't we do it the other way round ? In extenso make ALTIVEC select PPC_FPU and avoid the two selects that are above ? depends on PPC_BOOK3S_32 || PPC_BOOK3S_64 || (PPC_E500MC && PPC64) help This option enables kernel support for the Altivec extensions to the
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 19/04/2021 à 16:00, Steven Price a écrit : On 19/04/2021 14:14, Christophe Leroy wrote: Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : On 15/04/2021 18:18, Christophe Leroy wrote: To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Looking at how the generic ptdump code handles KASAN, I'm a bit sceptic. IIUC, it is checking that kasan_early_shadow_pte is in the same page as the pgtable referred by the PMD entry. But what happens if that PMD entry is referring another pgtable which is inside the same page as kasan_early_shadow_pte ? Shouldn't the test be if (pmd_page_vaddr(val) == lm_alias(kasan_early_shadow_pte)) return note_kasan_page_table(walk, addr); Now I come to look at this code again, I think you're right. On arm64 this doesn't cause a problem - page tables are page sized and page aligned, so there couldn't be any non-KASAN pgtables sharing the page. Obviously that's not necessarily true of other architectures. Feel free to add a patch to your series ;) Ok. I'll leave that outside of the series, it is not a show stopper because early shadow page directories are all tagged __bss_page_aligned so we can't have two of them in the same page and it is really unlikely that we'll have any other statically defined page directory in the same pages either. And for the special case of powerpc 8xx which is the only one for which we have both KASAN and HUGEPD at the time being, there are only two levels of page directories so no issue. Christophe
[PATCH 3/3] powerpc/irq: Enhance readability of trap types
This patch makes use of trap types in irq.c Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/interrupt.h | 1 + arch/powerpc/kernel/irq.c| 13 + 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index 8970990e3b08..44cde2e129b8 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -23,6 +23,7 @@ #define INTERRUPT_INST_SEGMENT0x480 #define INTERRUPT_TRACE 0xd00 #define INTERRUPT_H_DATA_STORAGE 0xe00 +#define INTERRUPT_HMI 0xe60 #define INTERRUPT_H_FAC_UNAVAIL 0xf80 #ifdef CONFIG_PPC_BOOK3S #define INTERRUPT_DOORBELL0xa00 diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 893d3f8d6f47..72cb45393ef2 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -142,7 +142,7 @@ void replay_soft_interrupts(void) */ if (IS_ENABLED(CONFIG_PPC_BOOK3S) && (local_paca->irq_happened & PACA_IRQ_HMI)) { local_paca->irq_happened &= ~PACA_IRQ_HMI; - regs.trap = 0xe60; + regs.trap = INTERRUPT_HMI; handle_hmi_exception(); if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS)) hard_irq_disable(); @@ -150,7 +150,7 @@ void replay_soft_interrupts(void) if (local_paca->irq_happened & PACA_IRQ_DEC) { local_paca->irq_happened &= ~PACA_IRQ_DEC; - regs.trap = 0x900; + regs.trap = INTERRUPT_DECREMENTER; timer_interrupt(); if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS)) hard_irq_disable(); @@ -158,7 +158,7 @@ void replay_soft_interrupts(void) if (local_paca->irq_happened & PACA_IRQ_EE) { local_paca->irq_happened &= ~PACA_IRQ_EE; - regs.trap = 0x500; + regs.trap = INTERRUPT_EXTERNAL; do_IRQ(); if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS)) hard_irq_disable(); @@ -166,10 +166,7 @@ void replay_soft_interrupts(void) if (IS_ENABLED(CONFIG_PPC_DOORBELL) && (local_paca->irq_happened & PACA_IRQ_DBELL)) { local_paca->irq_happened &= ~PACA_IRQ_DBELL; - if (IS_ENABLED(CONFIG_PPC_BOOK3E)) - regs.trap = 0x280; - else - regs.trap = 0xa00; + regs.trap = INTERRUPT_DOORBELL; doorbell_exception(); if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS)) hard_irq_disable(); @@ -178,7 +175,7 @@ void replay_soft_interrupts(void) /* Book3E does not support soft-masking PMI interrupts */ if (IS_ENABLED(CONFIG_PPC_BOOK3S) && (local_paca->irq_happened & PACA_IRQ_PMI)) { local_paca->irq_happened &= ~PACA_IRQ_PMI; - regs.trap = 0xf00; + regs.trap = INTERRUPT_PERFMON; performance_monitor_exception(); if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS)) hard_irq_disable(); -- 2.25.0
[PATCH 1/3] powerpc/8xx: Enhance readability of trap types
This patch makes use of trap types in head_8xx.S Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/interrupt.h | 29 arch/powerpc/kernel/head_8xx.S | 49 ++-- 2 files changed, 47 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index ed2c4042c3d1..cf2c5c3ae716 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -2,13 +2,6 @@ #ifndef _ASM_POWERPC_INTERRUPT_H #define _ASM_POWERPC_INTERRUPT_H -#include -#include -#include -#include -#include -#include - /* BookE/4xx */ #define INTERRUPT_CRITICAL_INPUT 0x100 @@ -39,9 +32,11 @@ /* BookE/BookS/4xx/8xx */ #define INTERRUPT_DATA_STORAGE0x300 #define INTERRUPT_INST_STORAGE0x400 +#define INTERRUPT_EXTERNAL 0x500 #define INTERRUPT_ALIGNMENT 0x600 #define INTERRUPT_PROGRAM 0x700 #define INTERRUPT_SYSCALL 0xc00 +#define INTERRUPT_TRACE0xd00 /* BookE/BookS/44x */ #define INTERRUPT_FP_UNAVAIL 0x800 @@ -53,6 +48,24 @@ #define INTERRUPT_PERFMON 0x0 #endif +/* 8xx */ +#define INTERRUPT_SOFT_EMU_8xx 0x1000 +#define INTERRUPT_INST_TLB_MISS_8xx0x1100 +#define INTERRUPT_DATA_TLB_MISS_8xx0x1200 +#define INTERRUPT_INST_TLB_ERROR_8xx 0x1300 +#define INTERRUPT_DATA_TLB_ERROR_8xx 0x1400 +#define INTERRUPT_DATA_BREAKPOINT_8xx 0x1c00 +#define INTERRUPT_INST_BREAKPOINT_8xx 0x1d00 + +#ifndef __ASSEMBLY__ + +#include +#include +#include +#include +#include +#include + static inline void nap_adjust_return(struct pt_regs *regs) { #ifdef CONFIG_PPC_970_NAP @@ -514,4 +527,6 @@ static inline void interrupt_cond_local_irq_enable(struct pt_regs *regs) local_irq_enable(); } +#endif /* __ASSEMBLY__ */ + #endif /* _ASM_POWERPC_INTERRUPT_H */ diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index e3b066703eab..7d445e4342c0 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -29,6 +29,7 @@ #include #include #include +#include /* * Value for the bits that have fixed value in RPN entries. @@ -118,49 +119,49 @@ instruction_counter: #endif /* System reset */ - EXCEPTION(0x100, Reset, system_reset_exception) + EXCEPTION(INTERRUPT_SYSTEM_RESET, Reset, system_reset_exception) /* Machine check */ - START_EXCEPTION(0x200, MachineCheck) - EXCEPTION_PROLOG 0x200 MachineCheck handle_dar_dsisr=1 + START_EXCEPTION(INTERRUPT_MACHINE_CHECK, MachineCheck) + EXCEPTION_PROLOG INTERRUPT_MACHINE_CHECK MachineCheck handle_dar_dsisr=1 prepare_transfer_to_handler bl machine_check_exception b interrupt_return /* External interrupt */ - EXCEPTION(0x500, HardwareInterrupt, do_IRQ) + EXCEPTION(INTERRUPT_EXTERNAL, HardwareInterrupt, do_IRQ) /* Alignment exception */ - START_EXCEPTION(0x600, Alignment) - EXCEPTION_PROLOG 0x600 Alignment handle_dar_dsisr=1 + START_EXCEPTION(INTERRUPT_ALIGNMENT, Alignment) + EXCEPTION_PROLOG INTERRUPT_ALIGNMENT Alignment handle_dar_dsisr=1 prepare_transfer_to_handler bl alignment_exception REST_NVGPRS(r1) b interrupt_return /* Program check exception */ - START_EXCEPTION(0x700, ProgramCheck) - EXCEPTION_PROLOG 0x700 ProgramCheck + START_EXCEPTION(INTERRUPT_PROGRAM, ProgramCheck) + EXCEPTION_PROLOG INTERRUPT_PROGRAM ProgramCheck prepare_transfer_to_handler bl program_check_exception REST_NVGPRS(r1) b interrupt_return /* Decrementer */ - EXCEPTION(0x900, Decrementer, timer_interrupt) + EXCEPTION(INTERRUPT_DECREMENTER, Decrementer, timer_interrupt) /* System call */ - START_EXCEPTION(0xc00, SystemCall) - SYSCALL_ENTRY 0xc00 + START_EXCEPTION(INTERRUPT_SYSCALL, SystemCall) + SYSCALL_ENTRY INTERRUPT_SYSCALL /* Single step - not used on 601 */ - EXCEPTION(0xd00, SingleStep, single_step_exception) + EXCEPTION(INTERRUPT_TRACE, SingleStep, single_step_exception) /* On the MPC8xx, this is a software emulation interrupt. It occurs * for all unimplemented and illegal instructions. */ - START_EXCEPTION(0x1000, SoftEmu) - EXCEPTION_PROLOG 0x1000 SoftEmu + START_EXCEPTION(INTERRUPT_SOFT_EMU_8xx, SoftEmu) + EXCEPTION_PROLOG INTERRUPT_SOFT_EMU_8xx SoftEmu prepare_transfer_to_handler bl emulation_assist_interrupt REST_NVGPRS(r1) @@ -187,7 +188,7 @@ instruction_counter: #define INVALIDATE_ADJACENT_PAGES_CPU15(addr, tmp) #endif - START_EXCEPTION(0x1100, InstructionTLBMiss) + START_EXCEPTION(INTERRUPT_INST_TLB_MISS_8xx, InstructionTLBMiss) mtspr SPRN_SPRG_SCRATCH2, r10 mtspr SPRN_M_TW, r11 @@ -243,7 +244,7 @@ instruction_counter
[PATCH 2/3] powerpc/32s: Enhance readability of trap types
This patch makes use of trap types in head_book3s_32.S Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/interrupt.h | 6 arch/powerpc/kernel/head_book3s_32.S | 43 ++-- 2 files changed, 28 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index cf2c5c3ae716..8970990e3b08 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -27,6 +27,7 @@ #ifdef CONFIG_PPC_BOOK3S #define INTERRUPT_DOORBELL0xa00 #define INTERRUPT_PERFMON 0xf00 +#define INTERRUPT_ALTIVEC_UNAVAIL 0xf20 #endif /* BookE/BookS/4xx/8xx */ @@ -57,6 +58,11 @@ #define INTERRUPT_DATA_BREAKPOINT_8xx 0x1c00 #define INTERRUPT_INST_BREAKPOINT_8xx 0x1d00 +/* 603 */ +#define INTERRUPT_INST_TLB_MISS_6030x1000 +#define INTERRUPT_DATA_LOAD_TLB_MISS_603 0x1100 +#define INTERRUPT_DATA_STORE_TLB_MISS_603 0x1200 + #ifndef __ASSEMBLY__ #include diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S index 18f4ae163f34..065178f19a3d 100644 --- a/arch/powerpc/kernel/head_book3s_32.S +++ b/arch/powerpc/kernel/head_book3s_32.S @@ -31,6 +31,7 @@ #include #include #include +#include #include "head_32.h" @@ -239,7 +240,7 @@ __secondary_hold_acknowledge: /* System reset */ /* core99 pmac starts the seconary here by changing the vector, and putting it back to what it was (unknown_async_exception) when done. */ - EXCEPTION(0x100, Reset, unknown_async_exception) + EXCEPTION(INTERRUPT_SYSTEM_RESET, Reset, unknown_async_exception) /* Machine check */ /* @@ -255,7 +256,7 @@ __secondary_hold_acknowledge: * pointer when we take an exception from supervisor mode.) * -- paulus. */ - START_EXCEPTION(0x200, MachineCheck) + START_EXCEPTION(INTERRUPT_MACHINE_CHECK, MachineCheck) EXCEPTION_PROLOG_0 #ifdef CONFIG_PPC_CHRP mtspr SPRN_SPRG_SCRATCH2,r1 @@ -276,7 +277,7 @@ __secondary_hold_acknowledge: b interrupt_return /* Data access exception. */ - START_EXCEPTION(0x300, DataAccess) + START_EXCEPTION(INTERRUPT_DATA_STORAGE, DataAccess) #ifdef CONFIG_PPC_BOOK3S_604 BEGIN_MMU_FTR_SECTION mtspr SPRN_SPRG_SCRATCH2,r10 @@ -297,7 +298,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE) #endif 1: EXCEPTION_PROLOG_0 handle_dar_dsisr=1 EXCEPTION_PROLOG_1 - EXCEPTION_PROLOG_2 0x300 DataAccess handle_dar_dsisr=1 + EXCEPTION_PROLOG_2 INTERRUPT_DATA_STORAGE DataAccess handle_dar_dsisr=1 prepare_transfer_to_handler lwz r5, _DSISR(r11) andis. r0, r5, DSISR_DABRMATCH@h @@ -310,7 +311,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE) /* Instruction access exception. */ - START_EXCEPTION(0x400, InstructionAccess) + START_EXCEPTION(INTERRUPT_INST_STORAGE, InstructionAccess) mtspr SPRN_SPRG_SCRATCH0,r10 mtspr SPRN_SPRG_SCRATCH1,r11 mfspr r10, SPRN_SPRG_THREAD @@ -330,7 +331,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE) andi. r11, r11, MSR_PR EXCEPTION_PROLOG_1 - EXCEPTION_PROLOG_2 0x400 InstructionAccess + EXCEPTION_PROLOG_2 INTERRUPT_INST_STORAGE InstructionAccess andis. r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */ stw r5, _DSISR(r11) stw r12, _DAR(r11) @@ -339,19 +340,19 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE) b interrupt_return /* External interrupt */ - EXCEPTION(0x500, HardwareInterrupt, do_IRQ) + EXCEPTION(INTERRUPT_EXTERNAL, HardwareInterrupt, do_IRQ) /* Alignment exception */ - START_EXCEPTION(0x600, Alignment) - EXCEPTION_PROLOG 0x600 Alignment handle_dar_dsisr=1 + START_EXCEPTION(INTERRUPT_ALIGNMENT, Alignment) + EXCEPTION_PROLOG INTERRUPT_ALIGNMENT Alignment handle_dar_dsisr=1 prepare_transfer_to_handler bl alignment_exception REST_NVGPRS(r1) b interrupt_return /* Program check exception */ - START_EXCEPTION(0x700, ProgramCheck) - EXCEPTION_PROLOG 0x700 ProgramCheck + START_EXCEPTION(INTERRUPT_PROGRAM, ProgramCheck) + EXCEPTION_PROLOG INTERRUPT_PROGRAM ProgramCheck prepare_transfer_to_handler bl program_check_exception REST_NVGPRS(r1) @@ -367,7 +368,7 @@ BEGIN_FTR_SECTION */ b ProgramCheck END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE) - EXCEPTION_PROLOG 0x800 FPUnavailable + EXCEPTION_PROLOG INTERRUPT_FP_UNAVAIL FPUnavailable beq 1f bl load_up_fpu /* if from user, just load it up */ b fast_exception_return @@ -379,16 +380,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE) #endif /* Decrementer */ - EXCEPTION(0x900, Decrementer, timer_interrupt) + EXCEPTION(INTERRUPT_D
Re: [PATCH 2/2] powerpc: add ALTIVEC support to lib/ when PPC_FPU not set
Le 19/04/2021 à 15:32, Segher Boessenkool a écrit : Hi! On Sun, Apr 18, 2021 at 01:17:26PM -0700, Randy Dunlap wrote: Add ldstfp.o to the Makefile for CONFIG_ALTIVEC and add externs for get_vr() and put_vr() in lib/sstep.c to fix the build errors. obj-$(CONFIG_PPC_FPU) += ldstfp.o +obj-$(CONFIG_ALTIVEC) += ldstfp.o It is probably a good idea to split ldstfp.S into two, one for each of the two configuration options? Or we can build it all the time and #ifdef the FPU part. Because it contains FPU, ALTIVEC and VSX stuff. Christophe
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : On 15/04/2021 18:18, Christophe Leroy wrote: To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Looking at how the generic ptdump code handles KASAN, I'm a bit sceptic. IIUC, it is checking that kasan_early_shadow_pte is in the same page as the pgtable referred by the PMD entry. But what happens if that PMD entry is referring another pgtable which is inside the same page as kasan_early_shadow_pte ? Shouldn't the test be if (pmd_page_vaddr(val) == lm_alias(kasan_early_shadow_pte)) return note_kasan_page_table(walk, addr); Christophe
[PATCH v2 4/4] powerpc/mm: Convert powerpc to GENERIC_PTDUMP
This patch converts powerpc to the generic PTDUMP implementation. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 + arch/powerpc/Kconfig.debug| 30 -- arch/powerpc/mm/Makefile | 2 +- arch/powerpc/mm/mmu_decl.h| 2 +- arch/powerpc/mm/ptdump/8xx.c | 6 +- arch/powerpc/mm/ptdump/Makefile | 9 +- arch/powerpc/mm/ptdump/book3s64.c | 6 +- arch/powerpc/mm/ptdump/ptdump.c | 165 -- arch/powerpc/mm/ptdump/shared.c | 6 +- 9 files changed, 68 insertions(+), 160 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 475d77a6ebbe..40259437a28f 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -120,6 +120,7 @@ config PPC select ARCH_32BIT_OFF_T if PPC32 select ARCH_HAS_DEBUG_VIRTUAL select ARCH_HAS_DEBUG_VM_PGTABLE + select ARCH_HAS_DEBUG_WXif STRICT_KERNEL_RWX select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE @@ -177,6 +178,7 @@ config PPC select GENERIC_IRQ_SHOW select GENERIC_IRQ_SHOW_LEVEL select GENERIC_PCI_IOMAPif PCI + select GENERIC_PTDUMP select GENERIC_SMP_IDLE_THREAD select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 6342f9da4545..05b1180ea502 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -360,36 +360,6 @@ config FAIL_IOMMU If you are unsure, say N. -config PPC_PTDUMP - bool "Export kernel pagetable layout to userspace via debugfs" - depends on DEBUG_KERNEL && DEBUG_FS - help - This option exports the state of the kernel pagetables to a - debugfs file. This is only useful for kernel developers who are - working in architecture specific areas of the kernel - probably - not a good idea to enable this feature in a production kernel. - - If you are unsure, say N. - -config PPC_DEBUG_WX - bool "Warn on W+X mappings at boot" - depends on PPC_PTDUMP && STRICT_KERNEL_RWX - help - Generate a warning if any W+X mappings are found at boot. - - This is useful for discovering cases where the kernel is leaving - W+X mappings after applying NX, as such mappings are a security risk. - - Note that even if the check fails, your kernel is possibly - still fine, as W+X mappings are not a security hole in - themselves, what they do is that they make the exploitation - of other unfixed kernel bugs easier. - - There is no runtime or memory usage effect of this option - once the kernel has booted up - it's a one time check. - - If in doubt, say "Y". - config PPC_FAST_ENDIAN_SWITCH bool "Deprecated fast endian-switch syscall" depends on DEBUG_KERNEL && PPC_BOOK3S_64 diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile index c3df3a8501d4..c90d58aaebe2 100644 --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -18,5 +18,5 @@ obj-$(CONFIG_PPC_MM_SLICES) += slice.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o obj-$(CONFIG_PPC_COPRO_BASE) += copro_fault.o -obj-$(CONFIG_PPC_PTDUMP) += ptdump/ +obj-$(CONFIG_PTDUMP_CORE) += ptdump/ obj-$(CONFIG_KASAN)+= kasan/ diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index 7dac910c0b21..dd1cabc2ea0f 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -180,7 +180,7 @@ static inline void mmu_mark_rodata_ro(void) { } void __init mmu_mapin_immr(void); #endif -#ifdef CONFIG_PPC_DEBUG_WX +#ifdef CONFIG_DEBUG_WX void ptdump_check_wx(void); #else static inline void ptdump_check_wx(void) { } diff --git a/arch/powerpc/mm/ptdump/8xx.c b/arch/powerpc/mm/ptdump/8xx.c index 86da2a669680..fac932eb8f9a 100644 --- a/arch/powerpc/mm/ptdump/8xx.c +++ b/arch/powerpc/mm/ptdump/8xx.c @@ -75,8 +75,10 @@ static const struct flag_info flag_array[] = { }; struct pgtable_level pg_level[5] = { - { - }, { /* pgd */ + { /* pgd */ + .flag = flag_array, + .num= ARRAY_SIZE(flag_array), + }, { /* p4d */ .flag = flag_array, .num= ARRAY_SIZE(flag_array), }, { /* pud */ diff --git a/arch/powerpc/mm/ptdump/Makefile b/arch/powerpc/mm/ptdump/Makefile index 712762be3cb1..4050cbb55acf 100644 --- a/arch/powerpc/mm/ptdump/Makefile +++ b/arch/powerpc/mm/ptdump/Makefile @@ -5,5 +5,10 @@ obj-y += ptdump.o obj-$(CONFIG_4xx) += shared.o obj-$(CONFIG_PPC_8xx) += 8xx.o obj-$(CONFIG_PPC_BOOK3E_MMU) += shared.o -obj-$(CONFIG_PPC_BOOK3S_32)+= shared.o bat
[PATCH v2 3/4] powerpc/mm: Properly coalesce pages in ptdump
Commit aaa229529244 ("powerpc/mm: Add physical address to Linux page table dump") changed range coalescing to only combine ranges that are both virtually and physically contiguous, in order to avoid erroneous combination of unrelated mappings in IOREMAP space. But in the VMALLOC space, mappings almost never have contiguous physical pages, so the commit mentionned above leads to dumping one line per page for vmalloc mappings. Taking into account the vmalloc always leave a gap between two areas, we never have two mappings dumped as a single combination even if they have the exact same flags. The only space that may have encountered such an issue was the early IOREMAP which is not using vmalloc engine. But previous commits added gaps between early IO mappings, so it is not an issue anymore. That commit created some difficulties with KASAN mappings, see commit cabe8138b23c ("powerpc: dump as a single line areas mapping a single physical page.") and with huge page, see commit b00ff6d8c1c3 ("powerpc/ptdump: Properly handle non standard page size"). So, almost revert commit aaa229529244 to properly coalesce pages mapped with the same flags as before, only keep the display of the first physical address of the range, as it can be usefull especially for IO mappings. It brings back powerpc at the same level as other architectures and simplifies the conversion to GENERIC PTDUMP. With the patch: ---[ kasan shadow mem start ]--- 0xf800-0xf8ff 0x070016M hugerw present dirty accessed 0xf900-0xf91f 0x01434000 2M rpresent accessed 0xf920-0xf95a 0x02104000 3776K rw present dirty accessed 0xfef5c000-0xfeff 0x01434000 656K rpresent accessed ---[ kasan shadow mem end ]--- Before: ---[ kasan shadow mem start ]--- 0xf800-0xf8ff 0x070016M hugerw present dirty accessed 0xf900-0xf91f 0x0143400016K rpresent accessed 0xf920-0xf9203fff 0x0210400016K rw present dirty accessed 0xf9204000-0xf9207fff 0x0213c00016K rw present dirty accessed 0xf9208000-0xf920bfff 0x0217400016K rw present dirty accessed 0xf920c000-0xf920 0x0218800016K rw present dirty accessed 0xf921-0xf9213fff 0x021dc00016K rw present dirty accessed 0xf9214000-0xf9217fff 0x022216K rw present dirty accessed 0xf9218000-0xf921bfff 0x023c16K rw present dirty accessed 0xf921c000-0xf921 0x023d400016K rw present dirty accessed 0xf922-0xf9227fff 0x023ec00032K rw present dirty accessed ... 0xf93b8000-0xf93e3fff 0x02614000 176K rw present dirty accessed 0xf93e4000-0xf94c3fff 0x027c 896K rw present dirty accessed 0xf94c4000-0xf94c7fff 0x0236c00016K rw present dirty accessed 0xf94c8000-0xf94cbfff 0x041f16K rw present dirty accessed 0xf94cc000-0xf94c 0x029c16K rw present dirty accessed 0xf94d-0xf94d3fff 0x041ec00016K rw present dirty accessed 0xf94d4000-0xf94d7fff 0x0407c00016K rw present dirty accessed 0xf94d8000-0xf94f7fff 0x041c 128K rw present dirty accessed ... 0xf95ac000-0xf95a 0x042b16K rw present dirty accessed 0xfef5c000-0xfeff 0x0143400016K rpresent accessed ---[ kasan shadow mem end ]--- Signed-off-by: Christophe Leroy Cc: Oliver O'Halloran --- arch/powerpc/mm/ptdump/ptdump.c | 22 +++--- 1 file changed, 3 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c index aca354fb670b..5062c58b1e5b 100644 --- a/arch/powerpc/mm/ptdump/ptdump.c +++ b/arch/powerpc/mm/ptdump/ptdump.c @@ -58,8 +58,6 @@ struct pg_state { const struct addr_marker *marker; unsigned long start_address; unsigned long start_pa; - unsigned long last_pa; - unsigned long page_size; unsigned int level; u64 current_flags; bool check_wx; @@ -163,8 +161,6 @@ static void dump_flag_info(struct pg_state *st, const struct flag_info static void dump_addr(struct pg_state *st, unsigned long addr) { -
[PATCH v2 1/4] mm: pagewalk: Fix walk for hugepage tables
Pagewalk ignores hugepd entries and walk down the tables as if it was traditionnal entries, leading to crazy result. Add walk_hugepd_range() and use it to walk hugepage tables. Signed-off-by: Christophe Leroy --- v2: - Add a guard for NULL ops->pte_entry - Take mm->page_table_lock when walking hugepage table, as suggested by follow_huge_pd() --- mm/pagewalk.c | 58 ++- 1 file changed, 53 insertions(+), 5 deletions(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index e81640d9f177..9b3db11a4d1d 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -58,6 +58,45 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, return err; } +#ifdef CONFIG_ARCH_HAS_HUGEPD +static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr, +unsigned long end, struct mm_walk *walk, int pdshift) +{ + int err = 0; + const struct mm_walk_ops *ops = walk->ops; + int shift = hugepd_shift(*phpd); + int page_size = 1 << shift; + + if (!ops->pte_entry) + return 0; + + if (addr & (page_size - 1)) + return 0; + + for (;;) { + pte_t *pte; + + spin_lock(>mm->page_table_lock); + pte = hugepte_offset(*phpd, addr, pdshift); + err = ops->pte_entry(pte, addr, addr + page_size, walk); + spin_unlock(>mm->page_table_lock); + + if (err) + break; + if (addr >= end - page_size) + break; + addr += page_size; + } + return err; +} +#else +static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr, +unsigned long end, struct mm_walk *walk, int pdshift) +{ + return 0; +} +#endif + static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -108,7 +147,10 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, goto again; } - err = walk_pte_range(pmd, addr, next, walk); + if (is_hugepd(__hugepd(pmd_val(*pmd + err = walk_hugepd_range((hugepd_t *)pmd, addr, next, walk, PMD_SHIFT); + else + err = walk_pte_range(pmd, addr, next, walk); if (err) break; } while (pmd++, addr = next, addr != end); @@ -157,7 +199,10 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, if (pud_none(*pud)) goto again; - err = walk_pmd_range(pud, addr, next, walk); + if (is_hugepd(__hugepd(pud_val(*pud + err = walk_hugepd_range((hugepd_t *)pud, addr, next, walk, PUD_SHIFT); + else + err = walk_pmd_range(pud, addr, next, walk); if (err) break; } while (pud++, addr = next, addr != end); @@ -189,7 +234,9 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, if (err) break; } - if (ops->pud_entry || ops->pmd_entry || ops->pte_entry) + if (is_hugepd(__hugepd(p4d_val(*p4d + err = walk_hugepd_range((hugepd_t *)p4d, addr, next, walk, P4D_SHIFT); + else if (ops->pud_entry || ops->pmd_entry || ops->pte_entry) err = walk_pud_range(p4d, addr, next, walk); if (err) break; @@ -224,8 +271,9 @@ static int walk_pgd_range(unsigned long addr, unsigned long end, if (err) break; } - if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry || - ops->pte_entry) + if (is_hugepd(__hugepd(pgd_val(*pgd + err = walk_hugepd_range((hugepd_t *)pgd, addr, next, walk, PGDIR_SHIFT); + else if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry || ops->pte_entry) err = walk_p4d_range(pgd, addr, next, walk); if (err) break; -- 2.25.0
[PATCH v2 0/4] Convert powerpc to GENERIC_PTDUMP
This series converts powerpc to generic PTDUMP. For that, we first need to add missing hugepd support to pagewalk and ptdump. v2: - Reworked the pagewalk modification to add locking and check ops->pte_entry - Modified powerpc early IO mapping to have gaps between mappings - Removed the logic that checked for contiguous physical memory - Removed the articial level calculation in ptdump_pte_entry(), level 4 is ok for all. - Removed page_size argument to note_page() Christophe Leroy (4): mm: pagewalk: Fix walk for hugepage tables powerpc/mm: Leave a gap between early allocated IO areas powerpc/mm: Properly coalesce pages in ptdump powerpc/mm: Convert powerpc to GENERIC_PTDUMP arch/powerpc/Kconfig | 2 + arch/powerpc/Kconfig.debug| 30 - arch/powerpc/mm/Makefile | 2 +- arch/powerpc/mm/ioremap_32.c | 4 +- arch/powerpc/mm/ioremap_64.c | 2 +- arch/powerpc/mm/mmu_decl.h| 2 +- arch/powerpc/mm/ptdump/8xx.c | 6 +- arch/powerpc/mm/ptdump/Makefile | 9 +- arch/powerpc/mm/ptdump/book3s64.c | 6 +- arch/powerpc/mm/ptdump/ptdump.c | 187 -- arch/powerpc/mm/ptdump/shared.c | 6 +- mm/pagewalk.c | 58 - 12 files changed, 127 insertions(+), 187 deletions(-) -- 2.25.0
[PATCH v2 2/4] powerpc/mm: Leave a gap between early allocated IO areas
Vmalloc system leaves a gap between allocated areas. It helps catching overflows. Do the same for IO areas which are allocated with early_ioremap_range() until slab_is_available(). Signed-off-by: Christophe Leroy --- arch/powerpc/mm/ioremap_32.c | 4 ++-- arch/powerpc/mm/ioremap_64.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/ioremap_32.c b/arch/powerpc/mm/ioremap_32.c index 743e11384dea..9d13143b8be4 100644 --- a/arch/powerpc/mm/ioremap_32.c +++ b/arch/powerpc/mm/ioremap_32.c @@ -70,10 +70,10 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, pgprot_t prot, void *call */ pr_warn("ioremap() called early from %pS. Use early_ioremap() instead\n", caller); - err = early_ioremap_range(ioremap_bot - size, p, size, prot); + err = early_ioremap_range(ioremap_bot - size - PAGE_SIZE, p, size, prot); if (err) return NULL; - ioremap_bot -= size; + ioremap_bot -= size + PAGE_SIZE; return (void __iomem *)ioremap_bot + offset; } diff --git a/arch/powerpc/mm/ioremap_64.c b/arch/powerpc/mm/ioremap_64.c index ba5cbb0d66bd..3acece00b33e 100644 --- a/arch/powerpc/mm/ioremap_64.c +++ b/arch/powerpc/mm/ioremap_64.c @@ -38,7 +38,7 @@ void __iomem *__ioremap_caller(phys_addr_t addr, unsigned long size, return NULL; ret = (void __iomem *)ioremap_bot + offset; - ioremap_bot += size; + ioremap_bot += size + PAGE_SIZE; return ret; } -- 2.25.0
Re: mmu.c:undefined reference to `patch__hash_page_A0'
Le 18/04/2021 à 19:15, Randy Dunlap a écrit : On 4/18/21 3:43 AM, Christophe Leroy wrote: Le 18/04/2021 à 02:02, Randy Dunlap a écrit : HI-- I no longer see this build error. Fixed by https://github.com/torvalds/linux/commit/acdad8fb4a1574323db88f98a38b630691574e16 However: On 2/27/21 2:24 AM, kernel test robot wrote: tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 3fb6d0e00efc958d01c2f109c8453033a2d96796 commit: 259149cf7c3c6195e6199e045ca988c31d081cab powerpc/32s: Only build hash code when CONFIG_PPC_BOOK3S_604 is selected date: 4 weeks ago config: powerpc64-randconfig-r013-20210227 (attached as .config) ktr/lkp, this is a PPC32 .config file that is attached, not PPC64. Also: compiler: powerpc-linux-gcc (GCC) 9.3.0 ... I do see this build error: powerpc-linux-ld: arch/powerpc/boot/wrapper.a(decompress.o): in function `partial_decompress': decompress.c:(.text+0x1f0): undefined reference to `__decompress' when either CONFIG_KERNEL_LZO=y or CONFIG_KERNEL_LZMA=y but the build succeeds when either CONFIG_KERNEL_GZIP=y or CONFIG_KERNEL_XZ=y I guess that is due to arch/powerpc/boot/decompress.c doing this: #ifdef CONFIG_KERNEL_GZIP # include "decompress_inflate.c" #endif #ifdef CONFIG_KERNEL_XZ # include "xz_config.h" # include "../../../lib/decompress_unxz.c" #endif It would be nice to require one of KERNEL_GZIP or KERNEL_XZ to be set/enabled (maybe unless a uImage is being built?). Can you test by https://patchwork.ozlabs.org/project/linuxppc-dev/patch/a74fce4dfc9fa32da6ce3470bbedcecf795de1ec.1591189069.git.christophe.le...@csgroup.eu/ ? Hi Christophe, I get build errors for both LZO and LZMA: Ok, the patch is almost 1 year old, I guess there has been changes that break it. Will see if I can find some time to look at it. Christophe
Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr
Le 17/04/2021 à 22:17, Randy Dunlap a écrit : Hi, kernel test robot reports: drivers/cpufreq/pmac32-cpufreq.c:262:2: error: implicit declaration of function 'enable_kernel_fp' [-Werror,-Wimplicit-function-declaration] enable_kernel_fp(); ^ when # CONFIG_PPC_FPU is not set CONFIG_ALTIVEC=y I see at least one other place that does not handle that combination well, here: ../arch/powerpc/lib/sstep.c: In function 'do_vec_load': ../arch/powerpc/lib/sstep.c:637:3: error: implicit declaration of function 'put_vr' [-Werror=implicit-function-declaration] 637 | put_vr(rn, ); | ^~ ../arch/powerpc/lib/sstep.c: In function 'do_vec_store': ../arch/powerpc/lib/sstep.c:660:3: error: implicit declaration of function 'get_vr'; did you mean 'get_oc'? [-Werror=implicit-function-declaration] 660 | get_vr(rn, ); | ^~ Should the code + Kconfigs/Makefiles handle that kind of kernel config or should ALTIVEC always mean PPC_FPU as well? As far as I understand, Altivec is completely independant of FPU in Theory. So it should be possible to use Altivec without using FPU. However, until recently, it was not possible to de-activate FPU support on book3s/32. I made it possible in order to reduce unneccessary processing on processors like the 832x that has no FPU. As far as I can see in cputable.h/.c, 832x is the only book3s/32 without FPU, and it doesn't have ALTIVEC either. So we can in the future ensure that Altivec can be used without FPU support, but for the time being I think it is OK to force selection of FPU when selecting ALTIVEC in order to avoid build failures. I have patches to fix the build errors with the config as reported but I don't know if that's the right thing to do... Lets see them. Christophe
Re: mmu.c:undefined reference to `patch__hash_page_A0'
Le 18/04/2021 à 02:02, Randy Dunlap a écrit : HI-- I no longer see this build error. Fixed by https://github.com/torvalds/linux/commit/acdad8fb4a1574323db88f98a38b630691574e16 However: On 2/27/21 2:24 AM, kernel test robot wrote: tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 3fb6d0e00efc958d01c2f109c8453033a2d96796 commit: 259149cf7c3c6195e6199e045ca988c31d081cab powerpc/32s: Only build hash code when CONFIG_PPC_BOOK3S_604 is selected date: 4 weeks ago config: powerpc64-randconfig-r013-20210227 (attached as .config) ktr/lkp, this is a PPC32 .config file that is attached, not PPC64. Also: compiler: powerpc-linux-gcc (GCC) 9.3.0 ... I do see this build error: powerpc-linux-ld: arch/powerpc/boot/wrapper.a(decompress.o): in function `partial_decompress': decompress.c:(.text+0x1f0): undefined reference to `__decompress' when either CONFIG_KERNEL_LZO=y or CONFIG_KERNEL_LZMA=y but the build succeeds when either CONFIG_KERNEL_GZIP=y or CONFIG_KERNEL_XZ=y I guess that is due to arch/powerpc/boot/decompress.c doing this: #ifdef CONFIG_KERNEL_GZIP # include "decompress_inflate.c" #endif #ifdef CONFIG_KERNEL_XZ # include "xz_config.h" # include "../../../lib/decompress_unxz.c" #endif It would be nice to require one of KERNEL_GZIP or KERNEL_XZ to be set/enabled (maybe unless a uImage is being built?). Can you test by https://patchwork.ozlabs.org/project/linuxppc-dev/patch/a74fce4dfc9fa32da6ce3470bbedcecf795de1ec.1591189069.git.christophe.le...@csgroup.eu/ ? Thanks Christophe
Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode
Le 16/04/2021 à 01:49, Alexei Starovoitov a écrit : On Thu, Apr 15, 2021 at 8:41 AM Quentin Monnet wrote: 2021-04-15 16:37 UTC+0200 ~ Daniel Borkmann On 4/15/21 11:32 AM, Jianlin Lv wrote: For debugging JITs, dumping the JITed image to kernel log is discouraged, "bpftool prog dump jited" is much better way to examine JITed dumps. This patch get rid of the code related to bpf_jit_enable=2 mode and update the proc handler of bpf_jit_enable, also added auxiliary information to explain how to use bpf_jit_disasm tool after this change. Signed-off-by: Jianlin Lv Hello, For what it's worth, I have already seen people dump the JIT image in kernel logs in Qemu VMs running with just a busybox, not for kernel development, but in a context where buiding/using bpftool was not possible. If building/using bpftool is not possible then majority of selftests won't be exercised. I don't think such environment is suitable for any kind of bpf development. Much so for JIT debugging. While bpf_jit_enable=2 is nothing but the debugging tool for JIT developers. I'd rather nuke that code instead of carrying it from kernel to kernel. When I implemented JIT for PPC32, it was extremely helpfull. As far as I understand, for the time being bpftool is not usable in my environment because it doesn't support cross compilation when the target's endianess differs from the building host endianess, see discussion at https://lore.kernel.org/bpf/21e66a09-514f-f426-b9e2-13baab0b9...@csgroup.eu/ That's right that selftests can't be exercised because they don't build. The question might be candid as I didn't investigate much about the replacement of "bpf_jit_enable=2 debugging mode" by bpftool, how do we use bpftool exactly for that ? Especially when using the BPF test module ?
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 17:04, Christophe Leroy a écrit : Le 16/04/2021 à 16:40, Christophe Leroy a écrit : Le 16/04/2021 à 15:00, Steven Price a écrit : On 16/04/2021 12:08, Christophe Leroy wrote: Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Yes ... and no. It looks like the generic ptdump handles the case when several pgdir entries points to the same kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page . I'm not sure I follow quite how powerpc is different here. But could you have a similar check for PTEs against kasan_early_shadow_pte as the other levels already have? I'm just worried that page_size isn't well defined in this interface and it's going to cause problems in the future. I'm trying. I reverted the two commits b00ff6d8c and cabe8138. At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M page whereas before reverting the patches I got one 16M line and one 112M line. And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 4M line for each PGDIR entry pointing to kasan_early_shadow_pte. 0xf800-0xf87f 0x0700 8M huge rw present 0xf880-0xf8ff 0x0780 8M huge rw present 0xf900-0xf93f 0x0143 4M r present ... 0xfec0-0xfeff 0x0143 4M r present Any idea ? I think the different with other architectures is here: } else if (flag != st->current_flags || level != st->level || addr >= st->marker[1].start_address || pa != st->last_pa + PAGE_SIZE) { In addition to the checks everyone do, powerpc also checks "pa != st->last_pa + PAGE_SIZE". And it is definitely for that test that page_size argument add been added. By replacing that test by (pa - st->start_pa != addr - st->start_address) it works again. So we definitely don't need the real page size. I see that other architectures except RISCV don't dump the physical address. But even RISCV doesn't include that check. That physical address dump was added by commit aaa229529244 ("powerpc/mm: Add physical address to Linux page table dump") [https://github.com/torvalds/linux/commit/aaa2295] How do other architectures deal with the problem described by the commit log of that patch ? Christophe
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 16:40, Christophe Leroy a écrit : Le 16/04/2021 à 15:00, Steven Price a écrit : On 16/04/2021 12:08, Christophe Leroy wrote: Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Yes ... and no. It looks like the generic ptdump handles the case when several pgdir entries points to the same kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page . I'm not sure I follow quite how powerpc is different here. But could you have a similar check for PTEs against kasan_early_shadow_pte as the other levels already have? I'm just worried that page_size isn't well defined in this interface and it's going to cause problems in the future. I'm trying. I reverted the two commits b00ff6d8c and cabe8138. At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M page whereas before reverting the patches I got one 16M line and one 112M line. And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 4M line for each PGDIR entry pointing to kasan_early_shadow_pte. 0xf800-0xf87f 0x0700 8M huge rw present 0xf880-0xf8ff 0x0780 8M huge rw present 0xf900-0xf93f 0x0143 4M r present ... 0xfec0-0xfeff 0x0143 4M r present Any idea ? I think the different with other architectures is here: } else if (flag != st->current_flags || level != st->level || addr >= st->marker[1].start_address || pa != st->last_pa + PAGE_SIZE) { In addition to the checks everyone do, powerpc also checks "pa != st->last_pa + PAGE_SIZE". And it is definitely for that test that page_size argument add been added. I see that other architectures except RISCV don't dump the physical address. But even RISCV doesn't include that check. That physical address dump was added by commit aaa229529244 ("powerpc/mm: Add physical address to Linux page table dump") [https://github.com/torvalds/linux/commit/aaa2295] How do other architectures deal with the problem described by the commit log of that patch ? Christophe
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 15:00, Steven Price a écrit : On 16/04/2021 12:08, Christophe Leroy wrote: Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : On 15/04/2021 18:18, Christophe Leroy wrote: In order to support large pages on powerpc, notepage() needs to know the page size of the page. Add a page_size argument to notepage(). Signed-off-by: Christophe Leroy --- arch/arm64/mm/ptdump.c | 2 +- arch/riscv/mm/ptdump.c | 2 +- arch/s390/mm/dump_pagetables.c | 3 ++- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h | 2 +- mm/ptdump.c | 16 6 files changed, 14 insertions(+), 13 deletions(-) [...] diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..61cd16afb1c8 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0])); + st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE); I'm not completely sure what the page_size is going to be used for, but note that KASAN presents an interesting case here. We short-cut by detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with level==4 because we know KASAN sets up the page table like that. However the one call actually covers a much larger region - so while PAGE_SIZE matches the level it doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on powerpc. Hum I successfully tested it with KASAN, I now realise that I tested it with CONFIG_KASAN_VMALLOC selected. In this situation, since https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table anymore. I'll test again without CONFIG_KASAN_VMALLOC. To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Yes ... and no. It looks like the generic ptdump handles the case when several pgdir entries points to the same kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page . I'm not sure I follow quite how powerpc is different here. But could you have a similar check for PTEs against kasan_early_shadow_pte as the other levels already have? I'm just worried that page_size isn't well defined in this interface and it's going to cause problems in the future. I'm trying. I reverted the two commits b00ff6d8c and cabe8138. At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M page whereas before reverting the patches I got one 16M line and one 112M line. And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 4M line for each PGDIR entry pointing to kasan_early_shadow_pte. 0xf800-0xf87f 0x0700 8M hugerw present 0xf880-0xf8ff 0x0780 8M hugerw present 0xf900-0xf93f 0x0143 4M rpresent 0xf940-0xf97f 0x0143 4M rpresent 0xf980-0xf9bf 0x0143 4M rpresent 0xf9c0-0xf9ff 0x0143 4M rpresent 0xfa00-0xfa3f 0x0143 4M rpresent 0xfa40-0xfa7f 0x0143 4M rpresent 0xfa80-0xfabf 0x0143 4M rpresent 0xfac0-0xfaff 0x0143 4M rpresent 0xfb00-0xfb3f 0x0143 4M rpresent 0xfb40-0xfb7f 0x0143
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : On 15/04/2021 18:18, Christophe Leroy wrote: In order to support large pages on powerpc, notepage() needs to know the page size of the page. Add a page_size argument to notepage(). Signed-off-by: Christophe Leroy --- arch/arm64/mm/ptdump.c | 2 +- arch/riscv/mm/ptdump.c | 2 +- arch/s390/mm/dump_pagetables.c | 3 ++- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h | 2 +- mm/ptdump.c | 16 6 files changed, 14 insertions(+), 13 deletions(-) [...] diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..61cd16afb1c8 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0])); + st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE); I'm not completely sure what the page_size is going to be used for, but note that KASAN presents an interesting case here. We short-cut by detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with level==4 because we know KASAN sets up the page table like that. However the one call actually covers a much larger region - so while PAGE_SIZE matches the level it doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on powerpc. Hum I successfully tested it with KASAN, I now realise that I tested it with CONFIG_KASAN_VMALLOC selected. In this situation, since https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table anymore. I'll test again without CONFIG_KASAN_VMALLOC. To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Yes ... and no. It looks like the generic ptdump handles the case when several pgdir entries points to the same kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page . Christophe
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 11:28, Steven Price a écrit : On 15/04/2021 18:18, Christophe Leroy wrote: In order to support large pages on powerpc, notepage() needs to know the page size of the page. Add a page_size argument to notepage(). Signed-off-by: Christophe Leroy --- arch/arm64/mm/ptdump.c | 2 +- arch/riscv/mm/ptdump.c | 2 +- arch/s390/mm/dump_pagetables.c | 3 ++- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h | 2 +- mm/ptdump.c | 16 6 files changed, 14 insertions(+), 13 deletions(-) [...] diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..61cd16afb1c8 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0])); + st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE); I'm not completely sure what the page_size is going to be used for, but note that KASAN presents an interesting case here. We short-cut by detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with level==4 because we know KASAN sets up the page table like that. However the one call actually covers a much larger region - so while PAGE_SIZE matches the level it doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on powerpc. Hum I successfully tested it with KASAN, I now realise that I tested it with CONFIG_KASAN_VMALLOC selected. In this situation, since https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table anymore. I'll test again without CONFIG_KASAN_VMALLOC. To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Christophe
Re: [PATCH] soc: fsl: qe: remove unused function
Le 16/04/2021 à 08:57, Daniel Axtens a écrit : Hi Jiapeng, Fix the following clang warning: You are not fixing a warning, you are removing a function in order to fix a warning ... drivers/soc/fsl/qe/qe_ic.c:234:29: warning: unused function 'qe_ic_from_irq' [-Wunused-function]. Would be wise to tell that the last users of the function where removed by commit d7c2878cfcfa ("soc: fsl: qe: remove unused qe_ic_set_* functions") https://github.com/torvalds/linux/commit/d7c2878cfcfa Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong --- drivers/soc/fsl/qe/qe_ic.c | 5 - 1 file changed, 5 deletions(-) diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c index 0390af9..b573712 100644 --- a/drivers/soc/fsl/qe/qe_ic.c +++ b/drivers/soc/fsl/qe/qe_ic.c @@ -231,11 +231,6 @@ static inline void qe_ic_write(__be32 __iomem *base, unsigned int reg, qe_iowrite32be(value, base + (reg >> 2)); } -static inline struct qe_ic *qe_ic_from_irq(unsigned int virq) -{ - return irq_get_chip_data(virq); -} This seems good to me. * We know that this function can't be called directly from outside the file, because it is static. * The function address isn't used as a function pointer anywhere, so that means it can't be called from outside the file that way (also it's inline, which would make using a function pointer unwise!) * There's no obvious macros in that file that might construct the name of the function in a way that is hidden from grep. All in all, I am fairly confident that the function is indeed not used. Reviewed-by: Daniel Axtens Kind regards, Daniel - static inline struct qe_ic *qe_ic_from_irq_data(struct irq_data *d) { return irq_data_get_irq_chip_data(d); -- 1.8.3.1
Re: [PATCH] symbol : Make the size of the compile-related array fixed
Also, the following statement which appears at the end of your mail is puzzling. What can we do with your patch if there are such limitations ? This e-mail and its attachments contain confidential information from OPPO, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! Le 16/04/2021 à 08:08, Christophe Leroy a écrit : Hi, This mail is unreadable. Please send your patch as raw text mail, not as attached file. Thanks Christophe Le 16/04/2021 à 05:12, 韩大鹏(Han Dapeng) a écrit : *OPPO* * * 本电子邮件及其附件含有OPPO公司的保密信息,仅限于邮件指明的收件人使用(包含个人及群组)。禁止任何人 在 未经授权的情况下以任何形式使用。如果您错收了本邮件,请立即以电子邮件通知发件人并删除本邮件及其 附件。 This e-mail and its attachments contain confidential information from OPPO, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Re: [PATCH] symbol : Make the size of the compile-related array fixed
Hi, This mail is unreadable. Please send your patch as raw text mail, not as attached file. Thanks Christophe Le 16/04/2021 à 05:12, 韩大鹏(Han Dapeng) a écrit : *OPPO* * * 本电子邮件及其附件含有OPPO公司的保密信息,仅限于邮件指明的收件人使用(包含个人及群组)。禁止任何人在 未经授权的情况下以任何形式使用。如果您错收了本邮件,请立即以电子邮件通知发件人并删除本邮件及其附件。 This e-mail and its attachments contain confidential information from OPPO, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Re: [PATCH v1 1/5] mm: pagewalk: Fix walk for hugepage tables
Le 16/04/2021 à 00:43, Daniel Axtens a écrit : Hi Christophe, Pagewalk ignores hugepd entries and walk down the tables as if it was traditionnal entries, leading to crazy result. Add walk_hugepd_range() and use it to walk hugepage tables. Signed-off-by: Christophe Leroy --- mm/pagewalk.c | 54 +-- 1 file changed, 48 insertions(+), 6 deletions(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index e81640d9f177..410a9d8f7572 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -58,6 +58,32 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, return err; } +static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr, +unsigned long end, struct mm_walk *walk, int pdshift) +{ + int err = 0; +#ifdef CONFIG_ARCH_HAS_HUGEPD + const struct mm_walk_ops *ops = walk->ops; + int shift = hugepd_shift(*phpd); + int page_size = 1 << shift; + + if (addr & (page_size - 1)) + return 0; + + for (;;) { + pte_t *pte = hugepte_offset(*phpd, addr, pdshift); + + err = ops->pte_entry(pte, addr, addr + page_size, walk); + if (err) + break; + if (addr >= end - page_size) + break; + addr += page_size; + } Initially I thought this was a somewhat unintuitive way to structure this loop, but I see it parallels the structure of walk_pte_range_inner, so I think the consistency is worth it. I notice the pte walking code potentially takes some locks: does this code need to do that? arch/powerpc/mm/hugetlbpage.c says that hugepds are protected by the mm->page_table_lock, but I don't think we're taking it in this code. I'll add it, thanks. +#endif + return err; +} + static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -108,7 +134,10 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, goto again; } - err = walk_pte_range(pmd, addr, next, walk); + if (is_hugepd(__hugepd(pmd_val(*pmd + err = walk_hugepd_range((hugepd_t *)pmd, addr, next, walk, PMD_SHIFT); + else + err = walk_pte_range(pmd, addr, next, walk); if (err) break; } while (pmd++, addr = next, addr != end); @@ -157,7 +186,10 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, if (pud_none(*pud)) goto again; - err = walk_pmd_range(pud, addr, next, walk); + if (is_hugepd(__hugepd(pud_val(*pud + err = walk_hugepd_range((hugepd_t *)pud, addr, next, walk, PUD_SHIFT); + else + err = walk_pmd_range(pud, addr, next, walk); I'm a bit worried you might end up calling into walk_hugepd_range with ops->pte_entry == NULL, and then jumping to 0. You are right, I missed it. I'll bail out of walk_hugepd_range() when ops->pte_entry is NULL. static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, struct mm_walk *walk) { ... pud = pud_offset(p4d, addr); do { ... if ((!walk->vma && (pud_leaf(*pud) || !pud_present(*pud))) || walk->action == ACTION_CONTINUE || !(ops->pmd_entry || ops->pte_entry)) <<< THIS CHECK continue; ... if (is_hugepd(__hugepd(pud_val(*pud err = walk_hugepd_range((hugepd_t *)pud, addr, next, walk, PUD_SHIFT); else err = walk_pmd_range(pud, addr, next, walk); if (err) break; } while (pud++, addr = next, addr != end); walk_pud_range will proceed if there is _either_ an ops->pmd_entry _or_ an ops->pte_entry, but walk_hugepd_range will call ops->pte_entry unconditionally. The same issue applies to walk_{p4d,pgd}_range... Kind regards, Daniel Thanks Christophe
Re: [PATCH v1 4/5] mm: ptdump: Support hugepd table entries
Hi Daniel, Le 16/04/2021 à 01:29, Daniel Axtens a écrit : Hi Christophe, Which hugepd, page table entries can be at any level and can be of any size. Add support for them. Signed-off-by: Christophe Leroy --- mm/ptdump.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/mm/ptdump.c b/mm/ptdump.c index 61cd16afb1c8..6efdb8c15a7d 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -112,11 +112,24 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr, { struct ptdump_state *st = walk->private; pte_t val = ptep_get(pte); + unsigned long page_size = next - addr; + int level; + + if (page_size >= PGDIR_SIZE) + level = 0; + else if (page_size >= P4D_SIZE) + level = 1; + else if (page_size >= PUD_SIZE) + level = 2; + else if (page_size >= PMD_SIZE) + level = 3; + else + level = 4; if (st->effective_prot) - st->effective_prot(st, 4, pte_val(val)); + st->effective_prot(st, level, pte_val(val)); - st->note_page(st, addr, 4, pte_val(val), PAGE_SIZE); + st->note_page(st, addr, level, pte_val(val), page_size); It seems to me that passing both level and page_size is a bit redundant, but I guess it does reduce the impact on each arch's code? Exactly, as shown above, the level can be re-calculated based on the page size, but it would be a unnecessary impact on all architectures and would duplicate the re-calculation of the level whereas in most cases we get it for free from the caller. Kind regards, Daniel return 0; } -- 2.25.0
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 01:12, Daniel Axtens a écrit : Hi Christophe, static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level, - u64 val) + u64 val, unsigned long page_size) Compilers can warn about unused parameters at -Wextra level. However, reading scripts/Makefile.extrawarn it looks like the warning is explicitly _disabled_ in the kernel at W=1 and not reenabled at W=2 or W=3. So I guess this is fine... There are a lot lot lot functions having unused parameters in the kernel , especially the ones that are re-implemented by each architecture. @@ -126,7 +126,7 @@ static int ptdump_hole(unsigned long addr, unsigned long next, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, depth, 0); + st->note_page(st, addr, depth, 0, 0); I know it doesn't matter at this point, but I'm not really thrilled by the idea of passing 0 as the size here. Doesn't the hole have a known page size? The hole has a size for sure, I don't think we can call it a page size: On powerpc 8xx, we have 4 page sizes: 8M, 512k, 16k and 4k. A page table will cover 4M areas and will contain pages of size 512k, 16k and 4k. A PGD table contains either entries which points to a page table (covering 4M), or two identical consecutive entries pointing to the same hugepd which contains a single PTE for an 8M page. So, if a PGD entry is empty, the hole is 4M, it corresponds to none of the page sizes the architecture supports. But looking at what is done with that size, it can make sense to pass it to notepage() anyway. Let's do that. return 0; } @@ -153,5 +153,5 @@ void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm, pgd_t *pgd) mmap_read_unlock(mm); /* Flush out the last page */ - st->note_page(st, 0, -1, 0); + st->note_page(st, 0, -1, 0, 0); I'm more OK with the idea of passing 0 as the size when the depth is -1 (don't know): if we don't know the depth we conceptually can't know the page size. Regards, Daniel
[PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
In order to support large pages on powerpc, notepage() needs to know the page size of the page. Add a page_size argument to notepage(). Signed-off-by: Christophe Leroy --- arch/arm64/mm/ptdump.c | 2 +- arch/riscv/mm/ptdump.c | 2 +- arch/s390/mm/dump_pagetables.c | 3 ++- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h | 2 +- mm/ptdump.c| 16 6 files changed, 14 insertions(+), 13 deletions(-) diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c index 0e050d76b83a..ea1a1c3a3ea0 100644 --- a/arch/arm64/mm/ptdump.c +++ b/arch/arm64/mm/ptdump.c @@ -257,7 +257,7 @@ static void note_prot_wx(struct pg_state *st, unsigned long addr) } static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level, - u64 val) + u64 val, unsigned long page_size) { struct pg_state *st = container_of(pt_st, struct pg_state, ptdump); static const char units[] = "KMGTPE"; diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c index ace74dec7492..0a7f276ba799 100644 --- a/arch/riscv/mm/ptdump.c +++ b/arch/riscv/mm/ptdump.c @@ -235,7 +235,7 @@ static void note_prot_wx(struct pg_state *st, unsigned long addr) } static void note_page(struct ptdump_state *pt_st, unsigned long addr, - int level, u64 val) + int level, u64 val, unsigned long page_size) { struct pg_state *st = container_of(pt_st, struct pg_state, ptdump); u64 pa = PFN_PHYS(pte_pfn(__pte(val))); diff --git a/arch/s390/mm/dump_pagetables.c b/arch/s390/mm/dump_pagetables.c index e40a30647d99..29673c38e773 100644 --- a/arch/s390/mm/dump_pagetables.c +++ b/arch/s390/mm/dump_pagetables.c @@ -116,7 +116,8 @@ static void note_prot_wx(struct pg_state *st, unsigned long addr) #endif /* CONFIG_DEBUG_WX */ } -static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level, u64 val) +static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level, + u64 val, unsigned long page_size) { int width = sizeof(unsigned long) * 2; static const char units[] = "KMGTPE"; diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index e1b599ecbbc2..2ec76737c1f1 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -272,7 +272,7 @@ static void effective_prot(struct ptdump_state *pt_st, int level, u64 val) * print what we collected so far. */ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level, - u64 val) + u64 val, unsigned long page_size) { struct pg_state *st = container_of(pt_st, struct pg_state, ptdump); pgprotval_t new_prot, new_eff; diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h index 2a3a95586425..3a971fadc95e 100644 --- a/include/linux/ptdump.h +++ b/include/linux/ptdump.h @@ -13,7 +13,7 @@ struct ptdump_range { struct ptdump_state { /* level is 0:PGD to 4:PTE, or -1 if unknown */ void (*note_page)(struct ptdump_state *st, unsigned long addr, - int level, u64 val); + int level, u64 val, unsigned long page_size); void (*effective_prot)(struct ptdump_state *st, int level, u64 val); const struct ptdump_range *range; }; diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..61cd16afb1c8 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0])); + st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE); walk->action = ACTION_CONTINUE; @@ -41,7 +41,7 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr, st->effective_prot(st, 0, pgd_val(val)); if (pgd_leaf(val)) - st->note_page(st, addr, 0, pgd_val(val)); + st->note_page(st, addr, 0, pgd_val(val), PGDIR_SIZE); return 0; } @@ -62,7 +62,7 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr, st->effective_prot(st, 1, p4d_val(val)); if (p4d_leaf(val)) - st->note_page(st, addr, 1, p4d_val(val)); + st->note_page(st, addr, 1, p4d_val(val), P4D_SIZE); return 0; } @@ -83,7 +83,7 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr, st->effective_prot(st, 2, pud_val(val)); if (pud_leaf(val)) - st->note_page(st, addr, 2, pud_val(val)); + st->note_page(st, addr, 2, pud_val(val), PUD_SIZE); return 0; } @@ -102,7 +102,7 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr, if (st->effecti
[PATCH v1 2/5] mm: ptdump: Fix build failure
CC mm/ptdump.o In file included from : mm/ptdump.c: In function 'ptdump_pte_entry': ././include/linux/compiler_types.h:320:38: error: call to '__compiletime_assert_207' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE(). 320 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ ././include/linux/compiler_types.h:301:4: note: in definition of macro '__compiletime_assert' 301 |prefix ## suffix();\ |^~ ././include/linux/compiler_types.h:320:2: note: in expansion of macro '_compiletime_assert' 320 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^~~ ./include/asm-generic/rwonce.h:36:2: note: in expansion of macro 'compiletime_assert' 36 | compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ | ^~ ./include/asm-generic/rwonce.h:49:2: note: in expansion of macro 'compiletime_assert_rwonce_type' 49 | compiletime_assert_rwonce_type(x);\ | ^~ mm/ptdump.c:114:14: note: in expansion of macro 'READ_ONCE' 114 | pte_t val = READ_ONCE(*pte); | ^ make[2]: *** [mm/ptdump.o] Error 1 READ_ONCE() cannot be used for reading PTEs. Use ptep_get() instead. See commit 481e980a7c19 ("mm: Allow arches to provide ptep_get()") and commit c0e1c8c22beb ("powerpc/8xx: Provide ptep_get() with 16k pages") for details. Fixes: 30d621f6723b ("mm: add generic ptdump") Cc: Steven Price Signed-off-by: Christophe Leroy --- mm/ptdump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/ptdump.c b/mm/ptdump.c index 4354c1422d57..da751448d0e4 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -111,7 +111,7 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr, unsigned long next, struct mm_walk *walk) { struct ptdump_state *st = walk->private; - pte_t val = READ_ONCE(*pte); + pte_t val = ptep_get(pte); if (st->effective_prot) st->effective_prot(st, 4, pte_val(val)); -- 2.25.0
[PATCH v1 4/5] mm: ptdump: Support hugepd table entries
Which hugepd, page table entries can be at any level and can be of any size. Add support for them. Signed-off-by: Christophe Leroy --- mm/ptdump.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/mm/ptdump.c b/mm/ptdump.c index 61cd16afb1c8..6efdb8c15a7d 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -112,11 +112,24 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr, { struct ptdump_state *st = walk->private; pte_t val = ptep_get(pte); + unsigned long page_size = next - addr; + int level; + + if (page_size >= PGDIR_SIZE) + level = 0; + else if (page_size >= P4D_SIZE) + level = 1; + else if (page_size >= PUD_SIZE) + level = 2; + else if (page_size >= PMD_SIZE) + level = 3; + else + level = 4; if (st->effective_prot) - st->effective_prot(st, 4, pte_val(val)); + st->effective_prot(st, level, pte_val(val)); - st->note_page(st, addr, 4, pte_val(val), PAGE_SIZE); + st->note_page(st, addr, level, pte_val(val), page_size); return 0; } -- 2.25.0
[PATCH v1 5/5] powerpc/mm: Convert powerpc to GENERIC_PTDUMP
This patch converts powerpc to the generic PTDUMP implementation. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 + arch/powerpc/Kconfig.debug| 30 -- arch/powerpc/mm/Makefile | 2 +- arch/powerpc/mm/mmu_decl.h| 2 +- arch/powerpc/mm/ptdump/8xx.c | 6 +- arch/powerpc/mm/ptdump/Makefile | 9 +- arch/powerpc/mm/ptdump/book3s64.c | 6 +- arch/powerpc/mm/ptdump/ptdump.c | 161 +- arch/powerpc/mm/ptdump/shared.c | 6 +- 9 files changed, 68 insertions(+), 156 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 475d77a6ebbe..40259437a28f 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -120,6 +120,7 @@ config PPC select ARCH_32BIT_OFF_T if PPC32 select ARCH_HAS_DEBUG_VIRTUAL select ARCH_HAS_DEBUG_VM_PGTABLE + select ARCH_HAS_DEBUG_WXif STRICT_KERNEL_RWX select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE @@ -177,6 +178,7 @@ config PPC select GENERIC_IRQ_SHOW select GENERIC_IRQ_SHOW_LEVEL select GENERIC_PCI_IOMAPif PCI + select GENERIC_PTDUMP select GENERIC_SMP_IDLE_THREAD select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 6342f9da4545..05b1180ea502 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -360,36 +360,6 @@ config FAIL_IOMMU If you are unsure, say N. -config PPC_PTDUMP - bool "Export kernel pagetable layout to userspace via debugfs" - depends on DEBUG_KERNEL && DEBUG_FS - help - This option exports the state of the kernel pagetables to a - debugfs file. This is only useful for kernel developers who are - working in architecture specific areas of the kernel - probably - not a good idea to enable this feature in a production kernel. - - If you are unsure, say N. - -config PPC_DEBUG_WX - bool "Warn on W+X mappings at boot" - depends on PPC_PTDUMP && STRICT_KERNEL_RWX - help - Generate a warning if any W+X mappings are found at boot. - - This is useful for discovering cases where the kernel is leaving - W+X mappings after applying NX, as such mappings are a security risk. - - Note that even if the check fails, your kernel is possibly - still fine, as W+X mappings are not a security hole in - themselves, what they do is that they make the exploitation - of other unfixed kernel bugs easier. - - There is no runtime or memory usage effect of this option - once the kernel has booted up - it's a one time check. - - If in doubt, say "Y". - config PPC_FAST_ENDIAN_SWITCH bool "Deprecated fast endian-switch syscall" depends on DEBUG_KERNEL && PPC_BOOK3S_64 diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile index c3df3a8501d4..c90d58aaebe2 100644 --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -18,5 +18,5 @@ obj-$(CONFIG_PPC_MM_SLICES) += slice.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o obj-$(CONFIG_PPC_COPRO_BASE) += copro_fault.o -obj-$(CONFIG_PPC_PTDUMP) += ptdump/ +obj-$(CONFIG_PTDUMP_CORE) += ptdump/ obj-$(CONFIG_KASAN)+= kasan/ diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index 7dac910c0b21..dd1cabc2ea0f 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -180,7 +180,7 @@ static inline void mmu_mark_rodata_ro(void) { } void __init mmu_mapin_immr(void); #endif -#ifdef CONFIG_PPC_DEBUG_WX +#ifdef CONFIG_DEBUG_WX void ptdump_check_wx(void); #else static inline void ptdump_check_wx(void) { } diff --git a/arch/powerpc/mm/ptdump/8xx.c b/arch/powerpc/mm/ptdump/8xx.c index 86da2a669680..fac932eb8f9a 100644 --- a/arch/powerpc/mm/ptdump/8xx.c +++ b/arch/powerpc/mm/ptdump/8xx.c @@ -75,8 +75,10 @@ static const struct flag_info flag_array[] = { }; struct pgtable_level pg_level[5] = { - { - }, { /* pgd */ + { /* pgd */ + .flag = flag_array, + .num= ARRAY_SIZE(flag_array), + }, { /* p4d */ .flag = flag_array, .num= ARRAY_SIZE(flag_array), }, { /* pud */ diff --git a/arch/powerpc/mm/ptdump/Makefile b/arch/powerpc/mm/ptdump/Makefile index 712762be3cb1..4050cbb55acf 100644 --- a/arch/powerpc/mm/ptdump/Makefile +++ b/arch/powerpc/mm/ptdump/Makefile @@ -5,5 +5,10 @@ obj-y += ptdump.o obj-$(CONFIG_4xx) += shared.o obj-$(CONFIG_PPC_8xx) += 8xx.o obj-$(CONFIG_PPC_BOOK3E_MMU) += shared.o -obj-$(CONFIG_PPC_BOOK3S_32)+= shared.o bat
[PATCH v1 1/5] mm: pagewalk: Fix walk for hugepage tables
Pagewalk ignores hugepd entries and walk down the tables as if it was traditionnal entries, leading to crazy result. Add walk_hugepd_range() and use it to walk hugepage tables. Signed-off-by: Christophe Leroy --- mm/pagewalk.c | 54 +-- 1 file changed, 48 insertions(+), 6 deletions(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index e81640d9f177..410a9d8f7572 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -58,6 +58,32 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, return err; } +static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr, +unsigned long end, struct mm_walk *walk, int pdshift) +{ + int err = 0; +#ifdef CONFIG_ARCH_HAS_HUGEPD + const struct mm_walk_ops *ops = walk->ops; + int shift = hugepd_shift(*phpd); + int page_size = 1 << shift; + + if (addr & (page_size - 1)) + return 0; + + for (;;) { + pte_t *pte = hugepte_offset(*phpd, addr, pdshift); + + err = ops->pte_entry(pte, addr, addr + page_size, walk); + if (err) + break; + if (addr >= end - page_size) + break; + addr += page_size; + } +#endif + return err; +} + static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -108,7 +134,10 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, goto again; } - err = walk_pte_range(pmd, addr, next, walk); + if (is_hugepd(__hugepd(pmd_val(*pmd + err = walk_hugepd_range((hugepd_t *)pmd, addr, next, walk, PMD_SHIFT); + else + err = walk_pte_range(pmd, addr, next, walk); if (err) break; } while (pmd++, addr = next, addr != end); @@ -157,7 +186,10 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, if (pud_none(*pud)) goto again; - err = walk_pmd_range(pud, addr, next, walk); + if (is_hugepd(__hugepd(pud_val(*pud + err = walk_hugepd_range((hugepd_t *)pud, addr, next, walk, PUD_SHIFT); + else + err = walk_pmd_range(pud, addr, next, walk); if (err) break; } while (pud++, addr = next, addr != end); @@ -189,8 +221,13 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, if (err) break; } - if (ops->pud_entry || ops->pmd_entry || ops->pte_entry) - err = walk_pud_range(p4d, addr, next, walk); + if (ops->pud_entry || ops->pmd_entry || ops->pte_entry) { + if (is_hugepd(__hugepd(p4d_val(*p4d + err = walk_hugepd_range((hugepd_t *)p4d, addr, next, walk, + P4D_SHIFT); + else + err = walk_pud_range(p4d, addr, next, walk); + } if (err) break; } while (p4d++, addr = next, addr != end); @@ -225,8 +262,13 @@ static int walk_pgd_range(unsigned long addr, unsigned long end, break; } if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry || - ops->pte_entry) - err = walk_p4d_range(pgd, addr, next, walk); + ops->pte_entry) { + if (is_hugepd(__hugepd(pgd_val(*pgd + err = walk_hugepd_range((hugepd_t *)pgd, addr, next, walk, + PGDIR_SHIFT); + else + err = walk_p4d_range(pgd, addr, next, walk); + } if (err) break; } while (pgd++, addr = next, addr != end); -- 2.25.0
[PATCH v1 0/5] Convert powerpc to GENERIC_PTDUMP
This series converts powerpc to generic PTDUMP. For that, we first need to add missing hugepd support to pagewalk and ptdump. Christophe Leroy (5): mm: pagewalk: Fix walk for hugepage tables mm: ptdump: Fix build failure mm: ptdump: Provide page size to notepage() mm: ptdump: Support hugepd table entries powerpc/mm: Convert powerpc to GENERIC_PTDUMP arch/arm64/mm/ptdump.c| 2 +- arch/powerpc/Kconfig | 2 + arch/powerpc/Kconfig.debug| 30 -- arch/powerpc/mm/Makefile | 2 +- arch/powerpc/mm/mmu_decl.h| 2 +- arch/powerpc/mm/ptdump/8xx.c | 6 +- arch/powerpc/mm/ptdump/Makefile | 9 +- arch/powerpc/mm/ptdump/book3s64.c | 6 +- arch/powerpc/mm/ptdump/ptdump.c | 161 +- arch/powerpc/mm/ptdump/shared.c | 6 +- arch/riscv/mm/ptdump.c| 2 +- arch/s390/mm/dump_pagetables.c| 3 +- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h| 2 +- mm/pagewalk.c | 54 -- mm/ptdump.c | 33 -- 16 files changed, 145 insertions(+), 177 deletions(-) -- 2.25.0
Re: [PATCH v13 14/14] powerpc/64s/radix: Enable huge vmalloc mappings
Hi Nick, Le 17/03/2021 à 07:24, Nicholas Piggin a écrit : This reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%, due to vfs hashes being allocated with 2MB pages. Cc: linuxppc-...@lists.ozlabs.org Acked-by: Michael Ellerman Signed-off-by: Nicholas Piggin --- .../admin-guide/kernel-parameters.txt | 2 ++ arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/module.c | 22 +++ 3 files changed, 21 insertions(+), 4 deletions(-) --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -87,13 +88,26 @@ int module_finalize(const Elf_Ehdr *hdr, return 0; } -#ifdef MODULES_VADDR void *module_alloc(unsigned long size) { + unsigned long start = VMALLOC_START; + unsigned long end = VMALLOC_END; + +#ifdef MODULES_VADDR BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR); + start = MODULES_VADDR; + end = MODULES_END; +#endif + + /* +* Don't do huge page allocations for modules yet until more testing +* is done. STRICT_MODULE_RWX may require extra work to support this +* too. +*/ - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, GFP_KERNEL, - PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, I think you should add the following in #ifndef MODULES_VADDR #define MODULES_VADDR VMALLOC_START #define MODULES_END VMALLOC_END #endif And leave module_alloc() as is (just removing the enclosing #ifdef MODULES_VADDR and adding the VM_NO_HUGE_VMAP flag) This would minimise the conflits with the changes I did in powerpc/next reported by Stephen R. + return __vmalloc_node_range(size, 1, start, end, GFP_KERNEL, + PAGE_KERNEL_EXEC, + VM_NO_HUGE_VMAP | VM_FLUSH_RESET_PERMS, + NUMA_NO_NODE, __builtin_return_address(0)); } -#endif
Re: linux-next: manual merge of the akpm-current tree with the powerpc tree
Le 15/04/2021 à 12:08, Christophe Leroy a écrit : Le 15/04/2021 à 12:07, Christophe Leroy a écrit : Le 15/04/2021 à 11:58, Stephen Rothwell a écrit : Hi all, On Thu, 15 Apr 2021 19:44:17 +1000 Stephen Rothwell wrote: Today's linux-next merge of the akpm-current tree got a conflict in: arch/powerpc/kernel/module.c between commit: 2ec13df16704 ("powerpc/modules: Load modules closer to kernel text") from the powerpc tree and commit: 4930ba789f8d ("powerpc/64s/radix: enable huge vmalloc mappings") from the akpm-current tree. I fixed it up (I think - see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/powerpc/kernel/module.c index fab84024650c,cdb2d88c54e7.. --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@@ -88,29 -88,26 +89,42 @@@ int module_finalize(const Elf_Ehdr *hdr return 0; } - #ifdef MODULES_VADDR -void *module_alloc(unsigned long size) +static __always_inline void * +__module_alloc(unsigned long size, unsigned long start, unsigned long end) { - unsigned long start = VMALLOC_START; - unsigned long end = VMALLOC_END; - -#ifdef MODULES_VADDR - BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR); - start = MODULES_VADDR; - end = MODULES_END; -#endif - + /* + * Don't do huge page allocations for modules yet until more testing + * is done. STRICT_MODULE_RWX may require extra work to support this + * too. + */ + return __vmalloc_node_range(size, 1, start, end, GFP_KERNEL, - PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, + PAGE_KERNEL_EXEC, + VM_NO_HUGE_VMAP | VM_FLUSH_RESET_PERMS, + NUMA_NO_NODE, __builtin_return_address(0)); } + ++ +void *module_alloc(unsigned long size) +{ ++ unsigned long start = VMALLOC_START; ++ unsigned long end = VMALLOC_END; + unsigned long limit = (unsigned long)_etext - SZ_32M; + void *ptr = NULL; + ++#ifdef MODULES_VADDR + BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR); ++ start = MODULES_VADDR; ++ end = MODULES_END; The #endif should be here. + + /* First try within 32M limit from _etext to avoid branch trampolines */ + if (MODULES_VADDR < PAGE_OFFSET && MODULES_END > limit) Should also use start and end here instead of MODULES_VADDR and MODULES_END The cleanest however should be to define MODULES_VADDR and MODULES_END all the time with a fallback to VMALLOC_START/VMALLOC_END, to avoid the #ifdef. The #ifdef was OK when we wanted to define modules_alloc() only when module area was different from vmalloc area, but now that we want modules_alloc() at all time, MODULES_VADDR and MODULES_END should be defined all the time. - ptr = __module_alloc(size, limit, MODULES_END); ++ ptr = __module_alloc(size, limit, end); + + if (!ptr) - ptr = __module_alloc(size, MODULES_VADDR, MODULES_END); ++#endif ++ ptr = __module_alloc(size, start, end); + + return ptr; +} - #endif Unfortunately, it also needs this: Before the #endif is too far. From: Stephen Rothwell Date: Thu, 15 Apr 2021 19:53:58 +1000 Subject: [PATCH] merge fix up for powerpc merge fix Signed-off-by: Stephen Rothwell --- arch/powerpc/kernel/module.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index d8ab1ad2eb05..c060f99afd4d 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -110,7 +110,9 @@ void *module_alloc(unsigned long size) { unsigned long start = VMALLOC_START; unsigned long end = VMALLOC_END; +#ifdef MODULES_VADDR unsigned long limit = (unsigned long)_etext - SZ_32M; +#endif void *ptr = NULL; #ifdef MODULES_VADDR
Re: linux-next: manual merge of the akpm-current tree with the powerpc tree
Le 15/04/2021 à 12:07, Christophe Leroy a écrit : Le 15/04/2021 à 11:58, Stephen Rothwell a écrit : Hi all, On Thu, 15 Apr 2021 19:44:17 +1000 Stephen Rothwell wrote: Today's linux-next merge of the akpm-current tree got a conflict in: arch/powerpc/kernel/module.c between commit: 2ec13df16704 ("powerpc/modules: Load modules closer to kernel text") from the powerpc tree and commit: 4930ba789f8d ("powerpc/64s/radix: enable huge vmalloc mappings") from the akpm-current tree. I fixed it up (I think - see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/powerpc/kernel/module.c index fab84024650c,cdb2d88c54e7.. --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@@ -88,29 -88,26 +89,42 @@@ int module_finalize(const Elf_Ehdr *hdr return 0; } - #ifdef MODULES_VADDR -void *module_alloc(unsigned long size) +static __always_inline void * +__module_alloc(unsigned long size, unsigned long start, unsigned long end) { - unsigned long start = VMALLOC_START; - unsigned long end = VMALLOC_END; - -#ifdef MODULES_VADDR - BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR); - start = MODULES_VADDR; - end = MODULES_END; -#endif - + /* + * Don't do huge page allocations for modules yet until more testing + * is done. STRICT_MODULE_RWX may require extra work to support this + * too. + */ + return __vmalloc_node_range(size, 1, start, end, GFP_KERNEL, - PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, + PAGE_KERNEL_EXEC, + VM_NO_HUGE_VMAP | VM_FLUSH_RESET_PERMS, + NUMA_NO_NODE, __builtin_return_address(0)); } + ++ +void *module_alloc(unsigned long size) +{ ++ unsigned long start = VMALLOC_START; ++ unsigned long end = VMALLOC_END; + unsigned long limit = (unsigned long)_etext - SZ_32M; + void *ptr = NULL; + ++#ifdef MODULES_VADDR + BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR); ++ start = MODULES_VADDR; ++ end = MODULES_END; The #endif should be here. + + /* First try within 32M limit from _etext to avoid branch trampolines */ + if (MODULES_VADDR < PAGE_OFFSET && MODULES_END > limit) Should also use start and end here instead of MODULES_VADDR and MODULES_END - ptr = __module_alloc(size, limit, MODULES_END); ++ ptr = __module_alloc(size, limit, end); + + if (!ptr) - ptr = __module_alloc(size, MODULES_VADDR, MODULES_END); ++#endif ++ ptr = __module_alloc(size, start, end); + + return ptr; +} - #endif Unfortunately, it also needs this: Before the #endif is too far. From: Stephen Rothwell Date: Thu, 15 Apr 2021 19:53:58 +1000 Subject: [PATCH] merge fix up for powerpc merge fix Signed-off-by: Stephen Rothwell --- arch/powerpc/kernel/module.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index d8ab1ad2eb05..c060f99afd4d 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -110,7 +110,9 @@ void *module_alloc(unsigned long size) { unsigned long start = VMALLOC_START; unsigned long end = VMALLOC_END; +#ifdef MODULES_VADDR unsigned long limit = (unsigned long)_etext - SZ_32M; +#endif void *ptr = NULL; #ifdef MODULES_VADDR
Re: linux-next: manual merge of the akpm-current tree with the powerpc tree
Le 15/04/2021 à 11:58, Stephen Rothwell a écrit : Hi all, On Thu, 15 Apr 2021 19:44:17 +1000 Stephen Rothwell wrote: Today's linux-next merge of the akpm-current tree got a conflict in: arch/powerpc/kernel/module.c between commit: 2ec13df16704 ("powerpc/modules: Load modules closer to kernel text") from the powerpc tree and commit: 4930ba789f8d ("powerpc/64s/radix: enable huge vmalloc mappings") from the akpm-current tree. I fixed it up (I think - see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/powerpc/kernel/module.c index fab84024650c,cdb2d88c54e7.. --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@@ -88,29 -88,26 +89,42 @@@ int module_finalize(const Elf_Ehdr *hdr return 0; } - #ifdef MODULES_VADDR -void *module_alloc(unsigned long size) +static __always_inline void * +__module_alloc(unsigned long size, unsigned long start, unsigned long end) { - unsigned long start = VMALLOC_START; - unsigned long end = VMALLOC_END; - -#ifdef MODULES_VADDR - BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR); - start = MODULES_VADDR; - end = MODULES_END; -#endif - + /* +* Don't do huge page allocations for modules yet until more testing +* is done. STRICT_MODULE_RWX may require extra work to support this +* too. +*/ + return __vmalloc_node_range(size, 1, start, end, GFP_KERNEL, - PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, + PAGE_KERNEL_EXEC, + VM_NO_HUGE_VMAP | VM_FLUSH_RESET_PERMS, + NUMA_NO_NODE, __builtin_return_address(0)); } + ++ +void *module_alloc(unsigned long size) +{ ++ unsigned long start = VMALLOC_START; ++ unsigned long end = VMALLOC_END; + unsigned long limit = (unsigned long)_etext - SZ_32M; + void *ptr = NULL; + ++#ifdef MODULES_VADDR + BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR); ++ start = MODULES_VADDR; ++ end = MODULES_END; The #endif should be here. + + /* First try within 32M limit from _etext to avoid branch trampolines */ + if (MODULES_VADDR < PAGE_OFFSET && MODULES_END > limit) - ptr = __module_alloc(size, limit, MODULES_END); ++ ptr = __module_alloc(size, limit, end); + + if (!ptr) - ptr = __module_alloc(size, MODULES_VADDR, MODULES_END); ++#endif ++ ptr = __module_alloc(size, start, end); + + return ptr; +} - #endif Unfortunately, it also needs this: Before the #endif is too far. From: Stephen Rothwell Date: Thu, 15 Apr 2021 19:53:58 +1000 Subject: [PATCH] merge fix up for powerpc merge fix Signed-off-by: Stephen Rothwell --- arch/powerpc/kernel/module.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index d8ab1ad2eb05..c060f99afd4d 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -110,7 +110,9 @@ void *module_alloc(unsigned long size) { unsigned long start = VMALLOC_START; unsigned long end = VMALLOC_END; +#ifdef MODULES_VADDR unsigned long limit = (unsigned long)_etext - SZ_32M; +#endif void *ptr = NULL; #ifdef MODULES_VADDR
[PATCH] mm: ptdump: Fix build failure
CC mm/ptdump.o In file included from : mm/ptdump.c: In function 'ptdump_pte_entry': ././include/linux/compiler_types.h:320:38: error: call to '__compiletime_assert_207' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE(). 320 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ ././include/linux/compiler_types.h:301:4: note: in definition of macro '__compiletime_assert' 301 |prefix ## suffix();\ |^~ ././include/linux/compiler_types.h:320:2: note: in expansion of macro '_compiletime_assert' 320 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^~~ ./include/asm-generic/rwonce.h:36:2: note: in expansion of macro 'compiletime_assert' 36 | compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ | ^~ ./include/asm-generic/rwonce.h:49:2: note: in expansion of macro 'compiletime_assert_rwonce_type' 49 | compiletime_assert_rwonce_type(x);\ | ^~ mm/ptdump.c:114:14: note: in expansion of macro 'READ_ONCE' 114 | pte_t val = READ_ONCE(*pte); | ^ make[2]: *** [mm/ptdump.o] Error 1 READ_ONCE() cannot be used for reading PTEs. Use ptep_get() instead. See commit 481e980a7c19 ("mm: Allow arches to provide ptep_get()") and commit c0e1c8c22beb ("powerpc/8xx: Provide ptep_get() with 16k pages") for details. Fixes: 30d621f6723b ("mm: add generic ptdump") Cc: Steven Price Signed-off-by: Christophe Leroy --- mm/ptdump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/ptdump.c b/mm/ptdump.c index 4354c1422d57..da751448d0e4 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -111,7 +111,7 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr, unsigned long next, struct mm_walk *walk) { struct ptdump_state *st = walk->private; - pte_t val = READ_ONCE(*pte); + pte_t val = ptep_get(pte); if (st->effective_prot) st->effective_prot(st, 4, pte_val(val)); -- 2.25.0
[PATCH v3 3/3] powerpc/atomics: Remove atomic_inc()/atomic_dec() and friends
Now that atomic_add() and atomic_sub() handle immediate operands, atomic_inc() and atomic_dec() have no added value compared to the generic fallback which calls atomic_add(1) and atomic_sub(1). Also remove atomic_inc_not_zero() which fallsback to atomic_add_unless() which itself fallsback to atomic_fetch_add_unless() which now handles immediate operands. Signed-off-by: Christophe Leroy --- v2: New --- arch/powerpc/include/asm/atomic.h | 95 --- 1 file changed, 95 deletions(-) diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h index eb1bdf14f67c..00ba5d9e837b 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -118,71 +118,6 @@ ATOMIC_OPS(xor, xor, "", K) #undef ATOMIC_OP_RETURN_RELAXED #undef ATOMIC_OP -static __inline__ void atomic_inc(atomic_t *v) -{ - int t; - - __asm__ __volatile__( -"1:lwarx %0,0,%2 # atomic_inc\n\ - addic %0,%0,1\n" -" stwcx. %0,0,%2 \n\ - bne-1b" - : "=" (t), "+m" (v->counter) - : "r" (>counter) - : "cc", "xer"); -} -#define atomic_inc atomic_inc - -static __inline__ int atomic_inc_return_relaxed(atomic_t *v) -{ - int t; - - __asm__ __volatile__( -"1:lwarx %0,0,%2 # atomic_inc_return_relaxed\n" -" addic %0,%0,1\n" -" stwcx. %0,0,%2\n" -" bne-1b" - : "=" (t), "+m" (v->counter) - : "r" (>counter) - : "cc", "xer"); - - return t; -} - -static __inline__ void atomic_dec(atomic_t *v) -{ - int t; - - __asm__ __volatile__( -"1:lwarx %0,0,%2 # atomic_dec\n\ - addic %0,%0,-1\n" -" stwcx. %0,0,%2\n\ - bne-1b" - : "=" (t), "+m" (v->counter) - : "r" (>counter) - : "cc", "xer"); -} -#define atomic_dec atomic_dec - -static __inline__ int atomic_dec_return_relaxed(atomic_t *v) -{ - int t; - - __asm__ __volatile__( -"1:lwarx %0,0,%2 # atomic_dec_return_relaxed\n" -" addic %0,%0,-1\n" -" stwcx. %0,0,%2\n" -" bne-1b" - : "=" (t), "+m" (v->counter) - : "r" (>counter) - : "cc", "xer"); - - return t; -} - -#define atomic_inc_return_relaxed atomic_inc_return_relaxed -#define atomic_dec_return_relaxed atomic_dec_return_relaxed - #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n))) #define atomic_cmpxchg_relaxed(v, o, n) \ cmpxchg_relaxed(&((v)->counter), (o), (n)) @@ -252,36 +187,6 @@ static __inline__ int atomic_fetch_add_unless(atomic_t *v, int a, int u) } #define atomic_fetch_add_unless atomic_fetch_add_unless -/** - * atomic_inc_not_zero - increment unless the number is zero - * @v: pointer of type atomic_t - * - * Atomically increments @v by 1, so long as @v is non-zero. - * Returns non-zero if @v was non-zero, and zero otherwise. - */ -static __inline__ int atomic_inc_not_zero(atomic_t *v) -{ - int t1, t2; - - __asm__ __volatile__ ( - PPC_ATOMIC_ENTRY_BARRIER -"1:lwarx %0,0,%2 # atomic_inc_not_zero\n\ - cmpwi 0,%0,0\n\ - beq-2f\n\ - addic %1,%0,1\n" -" stwcx. %1,0,%2\n\ - bne-1b\n" - PPC_ATOMIC_EXIT_BARRIER - "\n\ -2:" - : "=" (t1), "=" (t2) - : "r" (>counter) - : "cc", "xer", "memory"); - - return t1; -} -#define atomic_inc_not_zero(v) atomic_inc_not_zero((v)) - /* * Atomically test *v and decrement if it is greater than 0. * The function returns the old value of *v minus 1, even if -- 2.25.0
[PATCH v3 2/3] powerpc/atomics: Use immediate operand when possible
Today we get the following code generation for atomic operations: c001bb2c: 39 20 00 01 li r9,1 c001bb30: 7d 40 18 28 lwarx r10,0,r3 c001bb34: 7d 09 50 50 subfr8,r9,r10 c001bb38: 7d 00 19 2d stwcx. r8,0,r3 c001c7a8: 39 40 00 01 li r10,1 c001c7ac: 7d 00 18 28 lwarx r8,0,r3 c001c7b0: 7c ea 42 14 add r7,r10,r8 c001c7b4: 7c e0 19 2d stwcx. r7,0,r3 By allowing GCC to choose between immediate or regular operation, we get: c001bb2c: 7d 20 18 28 lwarx r9,0,r3 c001bb30: 39 49 ff ff addir10,r9,-1 c001bb34: 7d 40 19 2d stwcx. r10,0,r3 -- c001c7a4: 7d 40 18 28 lwarx r10,0,r3 c001c7a8: 39 0a 00 01 addir8,r10,1 c001c7ac: 7d 00 19 2d stwcx. r8,0,r3 For "and", the dot form has to be used because "andi" doesn't exist. For logical operations we use unsigned 16 bits immediate. For arithmetic operations we use signed 16 bits immediate. On pmac32_defconfig, it reduces the text by approx another 8 kbytes. Signed-off-by: Christophe Leroy Acked-by: Segher Boessenkool --- v2: Use "addc/addic" --- arch/powerpc/include/asm/atomic.h | 56 +++ 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h index 61c6e8b200e8..eb1bdf14f67c 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -37,62 +37,62 @@ static __inline__ void atomic_set(atomic_t *v, int i) __asm__ __volatile__("stw%U0%X0 %1,%0" : "=m"UPD_CONSTR(v->counter) : "r"(i)); } -#define ATOMIC_OP(op, asm_op) \ +#define ATOMIC_OP(op, asm_op, suffix, sign, ...) \ static __inline__ void atomic_##op(int a, atomic_t *v) \ { \ int t; \ \ __asm__ __volatile__( \ "1:lwarx %0,0,%3 # atomic_" #op "\n" \ - #asm_op " %0,%2,%0\n" \ + #asm_op "%I2" suffix " %0,%0,%2\n" \ " stwcx. %0,0,%3 \n" \ " bne-1b\n" \ : "=" (t), "+m" (v->counter) \ - : "r" (a), "r" (>counter)\ - : "cc");\ + : "r"#sign (a), "r" (>counter) \ + : "cc", ##__VA_ARGS__); \ } \ -#define ATOMIC_OP_RETURN_RELAXED(op, asm_op) \ +#define ATOMIC_OP_RETURN_RELAXED(op, asm_op, suffix, sign, ...) \ static inline int atomic_##op##_return_relaxed(int a, atomic_t *v) \ { \ int t; \ \ __asm__ __volatile__( \ "1:lwarx %0,0,%3 # atomic_" #op "_return_relaxed\n" \ - #asm_op " %0,%2,%0\n" \ + #asm_op "%I2" suffix " %0,%0,%2\n" \ " stwcx. %0,0,%3\n" \ " bne-1b\n" \ : "=" (t), "+m" (v->counter) \ - : "r" (a), "r" (>counter)\ - : "cc");\ + : "r"#sign (a), "r" (>counter) \ + : "cc", ##__VA_ARGS__); \ \ return t; \ } -#define ATOMIC_FETCH_OP_RELAXED(op, asm_op)\ +#de
[PATCH v3 1/3] powerpc/bitops: Use immediate operand when possible
Today we get the following code generation for bitops like set or clear bit: c0009fe0: 39 40 08 00 li r10,2048 c0009fe4: 7c e0 40 28 lwarx r7,0,r8 c0009fe8: 7c e7 53 78 or r7,r7,r10 c0009fec: 7c e0 41 2d stwcx. r7,0,r8 c000d568: 39 00 18 00 li r8,6144 c000d56c: 7c c0 38 28 lwarx r6,0,r7 c000d570: 7c c6 40 78 andcr6,r6,r8 c000d574: 7c c0 39 2d stwcx. r6,0,r7 Most set bits are constant on lower 16 bits, so it can easily be replaced by the "immediate" version of the operation. Allow GCC to choose between the normal or immediate form. For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for when all bits to be cleared are consecutive. On 64 bits we don't have any equivalent single operation for clearing, single bits or a few bits, we'd need two 'rldicl' so it is not worth it, the li/andc sequence is doing the same. With this patch we get: c0009fe0: 7d 00 50 28 lwarx r8,0,r10 c0009fe4: 61 08 08 00 ori r8,r8,2048 c0009fe8: 7d 00 51 2d stwcx. r8,0,r10 c000d558: 7c e0 40 28 lwarx r7,0,r8 c000d55c: 54 e7 05 64 rlwinm r7,r7,0,21,18 c000d560: 7c e0 41 2d stwcx. r7,0,r8 On pmac32_defconfig, it reduces the text by approx 10 kbytes. Signed-off-by: Christophe Leroy --- v3: - Using the mask validation proposed by Segher v2: - Use "n" instead of "i" as constraint for the rlwinm mask - Improve mask verification to handle more than single bit masks Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/bitops.h | 89 --- 1 file changed, 81 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h index 299ab33505a6..09500c789972 100644 --- a/arch/powerpc/include/asm/bitops.h +++ b/arch/powerpc/include/asm/bitops.h @@ -71,19 +71,61 @@ static inline void fn(unsigned long mask, \ __asm__ __volatile__ ( \ prefix \ "1:" PPC_LLARX(%0,0,%3,0) "\n" \ - stringify_in_c(op) "%0,%0,%2\n" \ + #op "%I2 %0,%0,%2\n"\ PPC_STLCX "%0,0,%3\n" \ "bne- 1b\n" \ : "=" (old), "+m" (*p)\ - : "r" (mask), "r" (p) \ + : "rK" (mask), "r" (p) \ : "cc", "memory"); \ } DEFINE_BITOP(set_bits, or, "") -DEFINE_BITOP(clear_bits, andc, "") -DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER) DEFINE_BITOP(change_bits, xor, "") +static __always_inline bool is_rlwinm_mask_valid(unsigned long x) +{ + if (!x) + return false; + if (x & 1) + x = ~x; // make the mask non-wrapping + x += x & -x;// adding the low set bit results in at most one bit set + + return !(x & (x - 1)); +} + +#define DEFINE_CLROP(fn, prefix) \ +static inline void fn(unsigned long mask, volatile unsigned long *_p) \ +{ \ + unsigned long old; \ + unsigned long *p = (unsigned long *)_p; \ + \ + if (IS_ENABLED(CONFIG_PPC32) && \ + __builtin_constant_p(mask) && is_rlwinm_mask_valid(~mask)) {\ + asm volatile ( \ + prefix \ + "1:""lwarx %0,0,%3\n" \ + "rlwinm %0,%0,0,%2\n" \ + "stwcx. %0,0,%3\n" \ + "bne- 1b\n" \ + : "=" (old), "+m" (*p)\ + : "n" (~mask), "r" (p) \ + : "cc", "memory"); \ + } else {\ + asm volatile ( \ + prefix \ + "1:"PPC_LLARX(%0,0,%3,0) &qu
[PATCH v3 3/4] powerpc: Rename probe_kernel_read_inst()
When probe_kernel_read_inst() was created, it was to mimic probe_kernel_read() function. Since then, probe_kernel_read() has been renamed copy_from_kernel_nofault(). Rename probe_kernel_read_inst() into copy_inst_from_kernel_nofault(). Signed-off-by: Christophe Leroy --- v3: copy_inst_from_kernel_nofault() instead of copy_from_kernel_nofault_inst() --- arch/powerpc/include/asm/inst.h| 3 +-- arch/powerpc/kernel/align.c| 2 +- arch/powerpc/kernel/trace/ftrace.c | 22 +++--- arch/powerpc/lib/inst.c| 3 +-- 4 files changed, 14 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index a40c3913a4a3..eaf5a6299034 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -177,7 +177,6 @@ static inline char *__ppc_inst_as_str(char str[PPC_INST_STR_LEN], struct ppc_ins __str; \ }) -int probe_kernel_read_inst(struct ppc_inst *inst, - struct ppc_inst *src); +int copy_inst_from_kernel_nofault(struct ppc_inst *inst, struct ppc_inst *src); #endif /* _ASM_POWERPC_INST_H */ diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index a97d5f1a3905..8f350d0478e6 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -311,7 +311,7 @@ int fix_alignment(struct pt_regs *regs) CHECK_FULL_REGS(regs); if (is_kernel_addr(regs->nip)) - r = probe_kernel_read_inst(, (void *)regs->nip); + r = copy_inst_from_kernel_nofault(, (void *)regs->nip); else r = __get_user_instr(instr, (void __user *)regs->nip); diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 42761ebec9f7..ffe9537195aa 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -68,7 +68,7 @@ ftrace_modify_code(unsigned long ip, struct ppc_inst old, struct ppc_inst new) */ /* read the text we want to modify */ - if (probe_kernel_read_inst(, (void *)ip)) + if (copy_inst_from_kernel_nofault(, (void *)ip)) return -EFAULT; /* Make sure it is what we expect it to be */ @@ -130,7 +130,7 @@ __ftrace_make_nop(struct module *mod, struct ppc_inst op, pop; /* read where this goes */ - if (probe_kernel_read_inst(, (void *)ip)) { + if (copy_inst_from_kernel_nofault(, (void *)ip)) { pr_err("Fetching opcode failed.\n"); return -EFAULT; } @@ -164,7 +164,7 @@ __ftrace_make_nop(struct module *mod, /* When using -mkernel_profile there is no load to jump over */ pop = ppc_inst(PPC_INST_NOP); - if (probe_kernel_read_inst(, (void *)(ip - 4))) { + if (copy_inst_from_kernel_nofault(, (void *)(ip - 4))) { pr_err("Fetching instruction at %lx failed.\n", ip - 4); return -EFAULT; } @@ -197,7 +197,7 @@ __ftrace_make_nop(struct module *mod, * Check what is in the next instruction. We can see ld r2,40(r1), but * on first pass after boot we will see mflr r0. */ - if (probe_kernel_read_inst(, (void *)(ip + 4))) { + if (copy_inst_from_kernel_nofault(, (void *)(ip + 4))) { pr_err("Fetching op failed.\n"); return -EFAULT; } @@ -349,7 +349,7 @@ static int setup_mcount_compiler_tramp(unsigned long tramp) return -1; /* New trampoline -- read where this goes */ - if (probe_kernel_read_inst(, (void *)tramp)) { + if (copy_inst_from_kernel_nofault(, (void *)tramp)) { pr_debug("Fetching opcode failed.\n"); return -1; } @@ -399,7 +399,7 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr) struct ppc_inst op; /* Read where this goes */ - if (probe_kernel_read_inst(, (void *)ip)) { + if (copy_inst_from_kernel_nofault(, (void *)ip)) { pr_err("Fetching opcode failed.\n"); return -EFAULT; } @@ -526,10 +526,10 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) struct module *mod = rec->arch.mod; /* read where this goes */ - if (probe_kernel_read_inst(op, ip)) + if (copy_inst_from_kernel_nofault(op, ip)) return -EFAULT; - if (probe_kernel_read_inst(op + 1, ip + 4)) + if (copy_inst_from_kernel_nofault(op + 1, ip + 4)) return -EFAULT; if (!expected_nop_sequence(ip, op[0], op[1])) { @@ -592,7 +592,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) unsigned long ip = rec->ip; /* read where this goes */ - if (probe_kernel_read_inst(, (void *)ip)) + if (copy_inst_from_kernel_nofault(,
[PATCH v3 4/4] powerpc: Move copy_from_kernel_nofault_inst()
When probe_kernel_read_inst() was created, there was no good place to put it, so a file called lib/inst.c was dedicated for it. Since then, probe_kernel_read_inst() has been renamed copy_from_kernel_nofault_inst(). And mm/maccess.h didn't exist at that time. Today, mm/maccess.h is related to copy_from_kernel_nofault(). Move copy_from_kernel_nofault_inst() into mm/maccess.c Signed-off-by: Christophe Leroy --- v2: Remove inst.o from Makefile --- arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/inst.c | 26 -- arch/powerpc/mm/maccess.c | 21 + 3 files changed, 22 insertions(+), 27 deletions(-) delete mode 100644 arch/powerpc/lib/inst.c diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index d4efc182662a..f2c690ee75d1 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -16,7 +16,7 @@ CFLAGS_code-patching.o += -DDISABLE_BRANCH_PROFILING CFLAGS_feature-fixups.o += -DDISABLE_BRANCH_PROFILING endif -obj-y += alloc.o code-patching.o feature-fixups.o pmem.o inst.o test_code-patching.o +obj-y += alloc.o code-patching.o feature-fixups.o pmem.o test_code-patching.o ifndef CONFIG_KASAN obj-y += string.o memcmp_$(BITS).o diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c deleted file mode 100644 index e554d1357f2f.. --- a/arch/powerpc/lib/inst.c +++ /dev/null @@ -1,26 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * Copyright 2020, IBM Corporation. - */ - -#include -#include -#include -#include - -int copy_inst_from_kernel_nofault(struct ppc_inst *inst, struct ppc_inst *src) -{ - unsigned int val, suffix; - int err; - - err = copy_from_kernel_nofault(, src, sizeof(val)); - if (err) - return err; - if (IS_ENABLED(CONFIG_PPC64) && get_op(val) == OP_PREFIX) { - err = copy_from_kernel_nofault(, (void *)src + 4, 4); - *inst = ppc_inst_prefix(val, suffix); - } else { - *inst = ppc_inst(val); - } - return err; -} diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c index fa9a7a718fc6..a3c30a884076 100644 --- a/arch/powerpc/mm/maccess.c +++ b/arch/powerpc/mm/maccess.c @@ -3,7 +3,28 @@ #include #include +#include +#include +#include + bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) { return is_kernel_addr((unsigned long)unsafe_src); } + +int copy_inst_from_kernel_nofault(struct ppc_inst *inst, struct ppc_inst *src) +{ + unsigned int val, suffix; + int err; + + err = copy_from_kernel_nofault(, src, sizeof(val)); + if (err) + return err; + if (IS_ENABLED(CONFIG_PPC64) && get_op(val) == OP_PREFIX) { + err = copy_from_kernel_nofault(, (void *)src + 4, 4); + *inst = ppc_inst_prefix(val, suffix); + } else { + *inst = ppc_inst(val); + } + return err; +} -- 2.25.0
[PATCH v3 2/4] powerpc: Make probe_kernel_read_inst() common to PPC32 and PPC64
We have two independant versions of probe_kernel_read_inst(), one for PPC32 and one for PPC64. The PPC32 is identical to the first part of the PPC64 version. The remaining part of PPC64 version is not relevant for PPC32, but not contradictory, so we can easily have a common function with the PPC64 part opted out via a IS_ENABLED(CONFIG_PPC64). The only need is to add a version of ppc_inst_prefix() for PPC32. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/inst.h | 2 ++ arch/powerpc/lib/inst.c | 17 + 2 files changed, 3 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index 2902d4e6a363..a40c3913a4a3 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -102,6 +102,8 @@ static inline bool ppc_inst_equal(struct ppc_inst x, struct ppc_inst y) #define ppc_inst(x) ((struct ppc_inst){ .val = x }) +#define ppc_inst_prefix(x, y) ppc_inst(x) + static inline bool ppc_inst_prefixed(struct ppc_inst x) { return false; diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c index c57b3548de37..0dff3ac2d45f 100644 --- a/arch/powerpc/lib/inst.c +++ b/arch/powerpc/lib/inst.c @@ -8,7 +8,6 @@ #include #include -#ifdef CONFIG_PPC64 int probe_kernel_read_inst(struct ppc_inst *inst, struct ppc_inst *src) { @@ -18,7 +17,7 @@ int probe_kernel_read_inst(struct ppc_inst *inst, err = copy_from_kernel_nofault(, src, sizeof(val)); if (err) return err; - if (get_op(val) == OP_PREFIX) { + if (IS_ENABLED(CONFIG_PPC64) && get_op(val) == OP_PREFIX) { err = copy_from_kernel_nofault(, (void *)src + 4, 4); *inst = ppc_inst_prefix(val, suffix); } else { @@ -26,17 +25,3 @@ int probe_kernel_read_inst(struct ppc_inst *inst, } return err; } -#else /* !CONFIG_PPC64 */ -int probe_kernel_read_inst(struct ppc_inst *inst, - struct ppc_inst *src) -{ - unsigned int val; - int err; - - err = copy_from_kernel_nofault(, src, sizeof(val)); - if (!err) - *inst = ppc_inst(val); - - return err; -} -#endif /* CONFIG_PPC64 */ -- 2.25.0
[PATCH v3 1/4] powerpc: Remove probe_user_read_inst()
Its name comes from former probe_user_read() function. That function is now called copy_from_user_nofault(). probe_user_read_inst() uses copy_from_user_nofault() to read only a few bytes. It is suboptimal. It does the same as get_user_inst() but in addition disables page faults. But on the other hand, it is not used for the time being. So remove it for now. If one day it is really needed, we can give it a new name more in line with today's naming, and implement it using get_user_inst() Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/inst.h | 3 --- arch/powerpc/lib/inst.c | 31 --- 2 files changed, 34 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index 19e18af2fac9..2902d4e6a363 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -175,9 +175,6 @@ static inline char *__ppc_inst_as_str(char str[PPC_INST_STR_LEN], struct ppc_ins __str; \ }) -int probe_user_read_inst(struct ppc_inst *inst, -struct ppc_inst __user *nip); - int probe_kernel_read_inst(struct ppc_inst *inst, struct ppc_inst *src); diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c index 9cc17eb62462..c57b3548de37 100644 --- a/arch/powerpc/lib/inst.c +++ b/arch/powerpc/lib/inst.c @@ -9,24 +9,6 @@ #include #ifdef CONFIG_PPC64 -int probe_user_read_inst(struct ppc_inst *inst, -struct ppc_inst __user *nip) -{ - unsigned int val, suffix; - int err; - - err = copy_from_user_nofault(, nip, sizeof(val)); - if (err) - return err; - if (get_op(val) == OP_PREFIX) { - err = copy_from_user_nofault(, (void __user *)nip + 4, 4); - *inst = ppc_inst_prefix(val, suffix); - } else { - *inst = ppc_inst(val); - } - return err; -} - int probe_kernel_read_inst(struct ppc_inst *inst, struct ppc_inst *src) { @@ -45,19 +27,6 @@ int probe_kernel_read_inst(struct ppc_inst *inst, return err; } #else /* !CONFIG_PPC64 */ -int probe_user_read_inst(struct ppc_inst *inst, -struct ppc_inst __user *nip) -{ - unsigned int val; - int err; - - err = copy_from_user_nofault(, nip, sizeof(val)); - if (!err) - *inst = ppc_inst(val); - - return err; -} - int probe_kernel_read_inst(struct ppc_inst *inst, struct ppc_inst *src) { -- 2.25.0
Re: [PATCH v2 3/4] powerpc: Rename probe_kernel_read_inst()
Le 14/04/2021 à 07:23, Aneesh Kumar K.V a écrit : Christophe Leroy writes: When probe_kernel_read_inst() was created, it was to mimic probe_kernel_read() function. Since then, probe_kernel_read() has been renamed copy_from_kernel_nofault(). Rename probe_kernel_read_inst() into copy_from_kernel_nofault_inst(). At first glance I read it as copy from kernel nofault instruction. How about copy_inst_from_kernel_nofault()? Yes good idea. Christophe
Re: [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible
Le 14/04/2021 à 14:24, Segher Boessenkool a écrit : On Wed, Apr 14, 2021 at 12:01:21PM +1000, Nicholas Piggin wrote: Would be nice if we could let the compiler deal with it all... static inline unsigned long lr(unsigned long *mem) { unsigned long val; /* * This doesn't clobber memory but want to avoid memory operations * moving ahead of it */ asm volatile("ldarx %0, %y1" : "=r"(val) : "Z"(*mem) : "memory"); return val; } (etc.) That can not work reliably: the compiler can put random instructions between the larx and stcx. this way, and you then do not have guaranteed forward progress anymore. It can put the two in different routines (after inlining and other interprocedural optimisations), duplicate them, make a different number of copies of them, etc. Nothing of that is okay if you want to guarantee forward progress on all implementations, and also not if you want to have good performance everywhere (or anywhere even). Unfortunately you have to write all larx/stcx. loops as one block of assembler, so that you know exactly what instructions will end up in your binary. If you don't, it will fail mysteriously after random recompilations, or have performance degradations, etc. You don't want to go there :-) Could the kernel use GCC builtin atomic functions instead ? https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
Re: [PATCH] init: consolidate trap_init()
Le 14/04/2021 à 10:58, Jisheng Zhang a écrit : Many architectures implement the trap_init() as NOP, since there is no such default for trap_init(), this empty stub is duplicated among these architectures. Provide a generic but weak NOP implementation to drop the empty stubs of trap_init() in these architectures. You define the weak function in the __init section. Most but not all architectures had it in __init section. And the remaining ones may not be defined in __init section. For instance look at the one in alpha architecture. Have you checked that it is not a problem ? It would be good to say something about it in the commit description. Signed-off-by: Jisheng Zhang --- arch/arc/kernel/traps.c | 5 - arch/arm/kernel/traps.c | 5 - arch/h8300/kernel/traps.c| 13 - arch/hexagon/kernel/traps.c | 4 arch/nds32/kernel/traps.c| 5 - arch/nios2/kernel/traps.c| 5 - arch/openrisc/kernel/traps.c | 5 - arch/parisc/kernel/traps.c | 4 arch/powerpc/kernel/traps.c | 5 - arch/riscv/kernel/traps.c| 5 - arch/um/kernel/trap.c| 4 init/main.c | 2 ++ 12 files changed, 2 insertions(+), 60 deletions(-) diff --git a/init/main.c b/init/main.c index 53b278845b88..4bdbe2928530 100644 --- a/init/main.c +++ b/init/main.c @@ -790,6 +790,8 @@ static inline void initcall_debug_enable(void) } #endif +void __init __weak trap_init(void) { } + I think in a C file we don't try to save space as much as in a header file. I would prefer something like: void __init __weak trap_init(void) { } /* Report memory auto-initialization states for this boot. */ static void __init report_meminit(void) {
Re: [PATCH] mm: Define ARCH_HAS_FIRST_USER_ADDRESS
Le 14/04/2021 à 07:59, Anshuman Khandual a écrit : On 4/14/21 10:52 AM, Christophe Leroy wrote: Le 14/04/2021 à 04:54, Anshuman Khandual a écrit : Currently most platforms define FIRST_USER_ADDRESS as 0UL duplicating the same code all over. Instead define a new option ARCH_HAS_FIRST_USER_ADDRESS for those platforms which would override generic default FIRST_USER_ADDRESS value 0UL. This makes it much cleaner with reduced code. Cc: linux-al...@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: linux-hexa...@vger.kernel.org Cc: linux-i...@vger.kernel.org Cc: linux-m...@lists.linux-m68k.org Cc: linux-m...@vger.kernel.org Cc: openr...@lists.librecores.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux-ri...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux...@lists.infradead.org Cc: linux-xte...@linux-xtensa.org Cc: x...@kernel.org Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/alpha/include/asm/pgtable.h | 1 - arch/arc/include/asm/pgtable.h | 6 -- arch/arm/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 2 -- arch/csky/include/asm/pgtable.h | 1 - arch/hexagon/include/asm/pgtable.h | 3 --- arch/ia64/include/asm/pgtable.h | 1 - arch/m68k/include/asm/pgtable_mm.h | 1 - arch/microblaze/include/asm/pgtable.h | 2 -- arch/mips/include/asm/pgtable-32.h | 1 - arch/mips/include/asm/pgtable-64.h | 1 - arch/nds32/Kconfig | 1 + arch/nios2/include/asm/pgtable.h | 2 -- arch/openrisc/include/asm/pgtable.h | 1 - arch/parisc/include/asm/pgtable.h | 2 -- arch/powerpc/include/asm/book3s/pgtable.h | 1 - arch/powerpc/include/asm/nohash/32/pgtable.h | 1 - arch/powerpc/include/asm/nohash/64/pgtable.h | 2 -- arch/riscv/include/asm/pgtable.h | 2 -- arch/s390/include/asm/pgtable.h | 2 -- arch/sh/include/asm/pgtable.h | 2 -- arch/sparc/include/asm/pgtable_32.h | 1 - arch/sparc/include/asm/pgtable_64.h | 3 --- arch/um/include/asm/pgtable-2level.h | 1 - arch/um/include/asm/pgtable-3level.h | 1 - arch/x86/include/asm/pgtable_types.h | 2 -- arch/xtensa/include/asm/pgtable.h | 1 - include/linux/mm.h | 4 mm/Kconfig | 4 29 files changed, 10 insertions(+), 43 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8ba434287387..47098ccd715e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -46,6 +46,10 @@ extern int sysctl_page_lock_unfairness; void init_mm_internals(void); +#ifndef ARCH_HAS_FIRST_USER_ADDRESS I guess you didn't test it . :) In fact I did :) Though just booted it on arm64 and cross compiled on multiple others platforms. should be #ifndef CONFIG_ARCH_HAS_FIRST_USER_ADDRESS Right, meant that instead. +#define FIRST_USER_ADDRESS 0UL +#endif But why do we need a config option at all for that ? Why not just: #ifndef FIRST_USER_ADDRESS #define FIRST_USER_ADDRESS 0UL #endif This sounds simpler. But just wondering, would not there be any possibility of build problems due to compilation sequence between arch and generic code ? For sure it has to be addresses carefully, but there are already a lot of stuff like that around pgtables.h For instance, pte_offset_kernel() has a generic definition in linux/pgtables.h based on whether it is already defined or not. Taking into account that FIRST_USER_ADDRESS is today in the architectures's asm/pgtables.h, I think putting the fallback definition in linux/pgtable.h would do the trick.
Re: [PATCH] mm: Define ARCH_HAS_FIRST_USER_ADDRESS
Le 14/04/2021 à 04:54, Anshuman Khandual a écrit : Currently most platforms define FIRST_USER_ADDRESS as 0UL duplicating the same code all over. Instead define a new option ARCH_HAS_FIRST_USER_ADDRESS for those platforms which would override generic default FIRST_USER_ADDRESS value 0UL. This makes it much cleaner with reduced code. Cc: linux-al...@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: linux-hexa...@vger.kernel.org Cc: linux-i...@vger.kernel.org Cc: linux-m...@lists.linux-m68k.org Cc: linux-m...@vger.kernel.org Cc: openr...@lists.librecores.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux-ri...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux...@lists.infradead.org Cc: linux-xte...@linux-xtensa.org Cc: x...@kernel.org Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/alpha/include/asm/pgtable.h | 1 - arch/arc/include/asm/pgtable.h | 6 -- arch/arm/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 2 -- arch/csky/include/asm/pgtable.h | 1 - arch/hexagon/include/asm/pgtable.h | 3 --- arch/ia64/include/asm/pgtable.h | 1 - arch/m68k/include/asm/pgtable_mm.h | 1 - arch/microblaze/include/asm/pgtable.h| 2 -- arch/mips/include/asm/pgtable-32.h | 1 - arch/mips/include/asm/pgtable-64.h | 1 - arch/nds32/Kconfig | 1 + arch/nios2/include/asm/pgtable.h | 2 -- arch/openrisc/include/asm/pgtable.h | 1 - arch/parisc/include/asm/pgtable.h| 2 -- arch/powerpc/include/asm/book3s/pgtable.h| 1 - arch/powerpc/include/asm/nohash/32/pgtable.h | 1 - arch/powerpc/include/asm/nohash/64/pgtable.h | 2 -- arch/riscv/include/asm/pgtable.h | 2 -- arch/s390/include/asm/pgtable.h | 2 -- arch/sh/include/asm/pgtable.h| 2 -- arch/sparc/include/asm/pgtable_32.h | 1 - arch/sparc/include/asm/pgtable_64.h | 3 --- arch/um/include/asm/pgtable-2level.h | 1 - arch/um/include/asm/pgtable-3level.h | 1 - arch/x86/include/asm/pgtable_types.h | 2 -- arch/xtensa/include/asm/pgtable.h| 1 - include/linux/mm.h | 4 mm/Kconfig | 4 29 files changed, 10 insertions(+), 43 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8ba434287387..47098ccd715e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -46,6 +46,10 @@ extern int sysctl_page_lock_unfairness; void init_mm_internals(void); +#ifndef ARCH_HAS_FIRST_USER_ADDRESS I guess you didn't test it . :) should be #ifndef CONFIG_ARCH_HAS_FIRST_USER_ADDRESS +#define FIRST_USER_ADDRESS 0UL +#endif But why do we need a config option at all for that ? Why not just: #ifndef FIRST_USER_ADDRESS #define FIRST_USER_ADDRESS 0UL #endif + #ifndef CONFIG_NEED_MULTIPLE_NODES/* Don't use mapnrs, do it properly */ extern unsigned long max_mapnr; diff --git a/mm/Kconfig b/mm/Kconfig index 24c045b24b95..373fbe377075 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -806,6 +806,10 @@ config VMAP_PFN config ARCH_USES_HIGH_VMA_FLAGS bool + +config ARCH_HAS_FIRST_USER_ADDRESS + bool + config ARCH_HAS_PKEYS bool
[PATCH v2 3/4] powerpc: Rename probe_kernel_read_inst()
When probe_kernel_read_inst() was created, it was to mimic probe_kernel_read() function. Since then, probe_kernel_read() has been renamed copy_from_kernel_nofault(). Rename probe_kernel_read_inst() into copy_from_kernel_nofault_inst(). Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/inst.h| 3 +-- arch/powerpc/kernel/align.c| 2 +- arch/powerpc/kernel/trace/ftrace.c | 22 +++--- arch/powerpc/lib/inst.c| 3 +-- 4 files changed, 14 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index a40c3913a4a3..a8ab0715f50e 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -177,7 +177,6 @@ static inline char *__ppc_inst_as_str(char str[PPC_INST_STR_LEN], struct ppc_ins __str; \ }) -int probe_kernel_read_inst(struct ppc_inst *inst, - struct ppc_inst *src); +int copy_from_kernel_nofault_inst(struct ppc_inst *inst, struct ppc_inst *src); #endif /* _ASM_POWERPC_INST_H */ diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index a97d5f1a3905..df3b55fec27d 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -311,7 +311,7 @@ int fix_alignment(struct pt_regs *regs) CHECK_FULL_REGS(regs); if (is_kernel_addr(regs->nip)) - r = probe_kernel_read_inst(, (void *)regs->nip); + r = copy_from_kernel_nofault_inst(, (void *)regs->nip); else r = __get_user_instr(instr, (void __user *)regs->nip); diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 42761ebec9f7..9daa4eb812ce 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -68,7 +68,7 @@ ftrace_modify_code(unsigned long ip, struct ppc_inst old, struct ppc_inst new) */ /* read the text we want to modify */ - if (probe_kernel_read_inst(, (void *)ip)) + if (copy_from_kernel_nofault_inst(, (void *)ip)) return -EFAULT; /* Make sure it is what we expect it to be */ @@ -130,7 +130,7 @@ __ftrace_make_nop(struct module *mod, struct ppc_inst op, pop; /* read where this goes */ - if (probe_kernel_read_inst(, (void *)ip)) { + if (copy_from_kernel_nofault_inst(, (void *)ip)) { pr_err("Fetching opcode failed.\n"); return -EFAULT; } @@ -164,7 +164,7 @@ __ftrace_make_nop(struct module *mod, /* When using -mkernel_profile there is no load to jump over */ pop = ppc_inst(PPC_INST_NOP); - if (probe_kernel_read_inst(, (void *)(ip - 4))) { + if (copy_from_kernel_nofault_inst(, (void *)(ip - 4))) { pr_err("Fetching instruction at %lx failed.\n", ip - 4); return -EFAULT; } @@ -197,7 +197,7 @@ __ftrace_make_nop(struct module *mod, * Check what is in the next instruction. We can see ld r2,40(r1), but * on first pass after boot we will see mflr r0. */ - if (probe_kernel_read_inst(, (void *)(ip + 4))) { + if (copy_from_kernel_nofault_inst(, (void *)(ip + 4))) { pr_err("Fetching op failed.\n"); return -EFAULT; } @@ -349,7 +349,7 @@ static int setup_mcount_compiler_tramp(unsigned long tramp) return -1; /* New trampoline -- read where this goes */ - if (probe_kernel_read_inst(, (void *)tramp)) { + if (copy_from_kernel_nofault_inst(, (void *)tramp)) { pr_debug("Fetching opcode failed.\n"); return -1; } @@ -399,7 +399,7 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr) struct ppc_inst op; /* Read where this goes */ - if (probe_kernel_read_inst(, (void *)ip)) { + if (copy_from_kernel_nofault_inst(, (void *)ip)) { pr_err("Fetching opcode failed.\n"); return -EFAULT; } @@ -526,10 +526,10 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) struct module *mod = rec->arch.mod; /* read where this goes */ - if (probe_kernel_read_inst(op, ip)) + if (copy_from_kernel_nofault_inst(op, ip)) return -EFAULT; - if (probe_kernel_read_inst(op + 1, ip + 4)) + if (copy_from_kernel_nofault_inst(op + 1, ip + 4)) return -EFAULT; if (!expected_nop_sequence(ip, op[0], op[1])) { @@ -592,7 +592,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) unsigned long ip = rec->ip; /* read where this goes */ - if (probe_kernel_read_inst(, (void *)ip)) + if (copy_from_kernel_nofault_inst(, (void *)ip)) return -EFAULT; /* It should be pointing
[PATCH v2 4/4] powerpc: Move copy_from_kernel_nofault_inst()
When probe_kernel_read_inst() was created, there was no good place to put it, so a file called lib/inst.c was dedicated for it. Since then, probe_kernel_read_inst() has been renamed copy_from_kernel_nofault_inst(). And mm/maccess.h didn't exist at that time. Today, mm/maccess.h is related to copy_from_kernel_nofault(). Move copy_from_kernel_nofault_inst() into mm/maccess.c Signed-off-by: Christophe Leroy --- v2: Remove inst.o from Makefile --- arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/inst.c | 26 -- arch/powerpc/mm/maccess.c | 21 + 3 files changed, 22 insertions(+), 27 deletions(-) delete mode 100644 arch/powerpc/lib/inst.c diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index d4efc182662a..f2c690ee75d1 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -16,7 +16,7 @@ CFLAGS_code-patching.o += -DDISABLE_BRANCH_PROFILING CFLAGS_feature-fixups.o += -DDISABLE_BRANCH_PROFILING endif -obj-y += alloc.o code-patching.o feature-fixups.o pmem.o inst.o test_code-patching.o +obj-y += alloc.o code-patching.o feature-fixups.o pmem.o test_code-patching.o ifndef CONFIG_KASAN obj-y += string.o memcmp_$(BITS).o diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c deleted file mode 100644 index ec7f6bae8b3c.. --- a/arch/powerpc/lib/inst.c +++ /dev/null @@ -1,26 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * Copyright 2020, IBM Corporation. - */ - -#include -#include -#include -#include - -int copy_from_kernel_nofault_inst(struct ppc_inst *inst, struct ppc_inst *src) -{ - unsigned int val, suffix; - int err; - - err = copy_from_kernel_nofault(, src, sizeof(val)); - if (err) - return err; - if (IS_ENABLED(CONFIG_PPC64) && get_op(val) == OP_PREFIX) { - err = copy_from_kernel_nofault(, (void *)src + 4, 4); - *inst = ppc_inst_prefix(val, suffix); - } else { - *inst = ppc_inst(val); - } - return err; -} diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c index fa9a7a718fc6..e75e74c52a8a 100644 --- a/arch/powerpc/mm/maccess.c +++ b/arch/powerpc/mm/maccess.c @@ -3,7 +3,28 @@ #include #include +#include +#include +#include + bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) { return is_kernel_addr((unsigned long)unsafe_src); } + +int copy_from_kernel_nofault_inst(struct ppc_inst *inst, struct ppc_inst *src) +{ + unsigned int val, suffix; + int err; + + err = copy_from_kernel_nofault(, src, sizeof(val)); + if (err) + return err; + if (IS_ENABLED(CONFIG_PPC64) && get_op(val) == OP_PREFIX) { + err = copy_from_kernel_nofault(, (void *)src + 4, 4); + *inst = ppc_inst_prefix(val, suffix); + } else { + *inst = ppc_inst(val); + } + return err; +} -- 2.25.0
[PATCH v2 2/4] powerpc: Make probe_kernel_read_inst() common to PPC32 and PPC64
We have two independant versions of probe_kernel_read_inst(), one for PPC32 and one for PPC64. The PPC32 is identical to the first part of the PPC64 version. The remaining part of PPC64 version is not relevant for PPC32, but not contradictory, so we can easily have a common function with the PPC64 part opted out via a IS_ENABLED(CONFIG_PPC64). The only need is to add a version of ppc_inst_prefix() for PPC32. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/inst.h | 2 ++ arch/powerpc/lib/inst.c | 17 + 2 files changed, 3 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index 2902d4e6a363..a40c3913a4a3 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -102,6 +102,8 @@ static inline bool ppc_inst_equal(struct ppc_inst x, struct ppc_inst y) #define ppc_inst(x) ((struct ppc_inst){ .val = x }) +#define ppc_inst_prefix(x, y) ppc_inst(x) + static inline bool ppc_inst_prefixed(struct ppc_inst x) { return false; diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c index c57b3548de37..0dff3ac2d45f 100644 --- a/arch/powerpc/lib/inst.c +++ b/arch/powerpc/lib/inst.c @@ -8,7 +8,6 @@ #include #include -#ifdef CONFIG_PPC64 int probe_kernel_read_inst(struct ppc_inst *inst, struct ppc_inst *src) { @@ -18,7 +17,7 @@ int probe_kernel_read_inst(struct ppc_inst *inst, err = copy_from_kernel_nofault(, src, sizeof(val)); if (err) return err; - if (get_op(val) == OP_PREFIX) { + if (IS_ENABLED(CONFIG_PPC64) && get_op(val) == OP_PREFIX) { err = copy_from_kernel_nofault(, (void *)src + 4, 4); *inst = ppc_inst_prefix(val, suffix); } else { @@ -26,17 +25,3 @@ int probe_kernel_read_inst(struct ppc_inst *inst, } return err; } -#else /* !CONFIG_PPC64 */ -int probe_kernel_read_inst(struct ppc_inst *inst, - struct ppc_inst *src) -{ - unsigned int val; - int err; - - err = copy_from_kernel_nofault(, src, sizeof(val)); - if (!err) - *inst = ppc_inst(val); - - return err; -} -#endif /* CONFIG_PPC64 */ -- 2.25.0
[PATCH v2 1/4] powerpc: Remove probe_user_read_inst()
Its name comes from former probe_user_read() function. That function is now called copy_from_user_nofault(). probe_user_read_inst() uses copy_from_user_nofault() to read only a few bytes. It is suboptimal. It does the same as get_user_inst() but in addition disables page faults. But on the other hand, it is not used for the time being. So remove it for now. If one day it is really needed, we can give it a new name more in line with today's naming, and implement it using get_user_inst() Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/inst.h | 3 --- arch/powerpc/lib/inst.c | 31 --- 2 files changed, 34 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index 19e18af2fac9..2902d4e6a363 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -175,9 +175,6 @@ static inline char *__ppc_inst_as_str(char str[PPC_INST_STR_LEN], struct ppc_ins __str; \ }) -int probe_user_read_inst(struct ppc_inst *inst, -struct ppc_inst __user *nip); - int probe_kernel_read_inst(struct ppc_inst *inst, struct ppc_inst *src); diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c index 9cc17eb62462..c57b3548de37 100644 --- a/arch/powerpc/lib/inst.c +++ b/arch/powerpc/lib/inst.c @@ -9,24 +9,6 @@ #include #ifdef CONFIG_PPC64 -int probe_user_read_inst(struct ppc_inst *inst, -struct ppc_inst __user *nip) -{ - unsigned int val, suffix; - int err; - - err = copy_from_user_nofault(, nip, sizeof(val)); - if (err) - return err; - if (get_op(val) == OP_PREFIX) { - err = copy_from_user_nofault(, (void __user *)nip + 4, 4); - *inst = ppc_inst_prefix(val, suffix); - } else { - *inst = ppc_inst(val); - } - return err; -} - int probe_kernel_read_inst(struct ppc_inst *inst, struct ppc_inst *src) { @@ -45,19 +27,6 @@ int probe_kernel_read_inst(struct ppc_inst *inst, return err; } #else /* !CONFIG_PPC64 */ -int probe_user_read_inst(struct ppc_inst *inst, -struct ppc_inst __user *nip) -{ - unsigned int val; - int err; - - err = copy_from_user_nofault(, nip, sizeof(val)); - if (!err) - *inst = ppc_inst(val); - - return err; -} - int probe_kernel_read_inst(struct ppc_inst *inst, struct ppc_inst *src) { -- 2.25.0
[PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
Using asm goto in __WARN_FLAGS() and WARN_ON() allows more flexibility to GCC. For that add an entry to the exception table so that program_check_exception() knowns where to resume execution after a WARNING. Here are two exemples. The first one is done on PPC32 (which benefits from the previous patch), the second is on PPC64. unsigned long test(struct pt_regs *regs) { int ret; WARN_ON(regs->msr & MSR_PR); return regs->gpr[3]; } unsigned long test9w(unsigned long a, unsigned long b) { if (WARN_ON(!b)) return 0; return a / b; } Before the patch: 03a8 : 3a8: 81 23 00 84 lwz r9,132(r3) 3ac: 71 29 40 00 andi. r9,r9,16384 3b0: 40 82 00 0c bne 3bc 3b4: 80 63 00 0c lwz r3,12(r3) 3b8: 4e 80 00 20 blr 3bc: 0f e0 00 00 twuir0,0 3c0: 80 63 00 0c lwz r3,12(r3) 3c4: 4e 80 00 20 blr 0bf0 <.test9w>: bf0: 7c 89 00 74 cntlzd r9,r4 bf4: 79 29 d1 82 rldicl r9,r9,58,6 bf8: 0b 09 00 00 tdnei r9,0 bfc: 2c 24 00 00 cmpdi r4,0 c00: 41 82 00 0c beq c0c <.test9w+0x1c> c04: 7c 63 23 92 divdu r3,r3,r4 c08: 4e 80 00 20 blr c0c: 38 60 00 00 li r3,0 c10: 4e 80 00 20 blr After the patch: 03a8 : 3a8: 81 23 00 84 lwz r9,132(r3) 3ac: 71 29 40 00 andi. r9,r9,16384 3b0: 40 82 00 0c bne 3bc 3b4: 80 63 00 0c lwz r3,12(r3) 3b8: 4e 80 00 20 blr 3bc: 0f e0 00 00 twuir0,0 0c50 <.test9w>: c50: 7c 89 00 74 cntlzd r9,r4 c54: 79 29 d1 82 rldicl r9,r9,58,6 c58: 0b 09 00 00 tdnei r9,0 c5c: 7c 63 23 92 divdu r3,r3,r4 c60: 4e 80 00 20 blr c70: 38 60 00 00 li r3,0 c74: 4e 80 00 20 blr In the first exemple, we see GCC doesn't need to duplicate what happens after the trap. In the second exemple, we see that GCC doesn't need to emit a test and a branch in the likely path in addition to the trap. We've got some WARN_ON() in .softirqentry.text section so it needs to be added in the OTHER_TEXT_SECTIONS in modpost.c Signed-off-by: Christophe Leroy --- v2: Fix build failure when CONFIG_BUG is not selected. --- arch/powerpc/include/asm/book3s/64/kup.h | 2 +- arch/powerpc/include/asm/bug.h | 54 arch/powerpc/include/asm/extable.h | 14 ++ arch/powerpc/include/asm/ppc_asm.h | 11 + arch/powerpc/kernel/entry_64.S | 2 +- arch/powerpc/kernel/exceptions-64e.S | 2 +- arch/powerpc/kernel/misc_32.S| 2 +- arch/powerpc/kernel/traps.c | 9 +++- scripts/mod/modpost.c| 2 +- 9 files changed, 72 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index 9700da3a4093..a22839cba32e 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -90,7 +90,7 @@ /* Prevent access to userspace using any key values */ LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED) 999: tdne\gpr1, \gpr2 - EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE) + EMIT_WARN_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE) END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_BOOK3S_KUAP, 67) #endif .endm diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index 101dea4eec8d..e22dc503fb2f 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -4,6 +4,7 @@ #ifdef __KERNEL__ #include +#include #ifdef CONFIG_BUG @@ -30,6 +31,11 @@ .endm #endif /* verbose */ +.macro EMIT_WARN_ENTRY addr,file,line,flags + EX_TABLE(\addr,\addr+4) + EMIT_BUG_ENTRY \addr,\file,\line,\flags +.endm + #else /* !__ASSEMBLY__ */ /* _EMIT_BUG_ENTRY expects args %0,%1,%2,%3 to be FILE, LINE, flags and sizeof(struct bug_entry), respectively */ @@ -58,6 +64,16 @@ "i" (sizeof(struct bug_entry)), \ ##__VA_ARGS__) +#define WARN_ENTRY(insn, flags, label, ...)\ + asm_volatile_goto( \ + "1: " insn "\n" \ + EX_TABLE(1b, %l[label]) \ + _EMIT_BUG_ENTRY \ + : : "i" (__FILE__), "i" (__LINE__), \ + "i&
[PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
powerpc BUG_ON() and WARN_ON() are based on using twnei instruction. For catching simple conditions like a variable having value 0, this is efficient because it does the test and the trap at the same time. But most conditions used with BUG_ON or WARN_ON are more complex and forces GCC to format the condition into a 0 or 1 value in a register. This will usually require 2 to 3 instructions. The most efficient solution would be to use __builtin_trap() because GCC is able to optimise the use of the different trap instructions based on the requested condition, but this is complex if not impossible for the following reasons: - __builtin_trap() is a non-recoverable instruction, so it can't be used for WARN_ON - Knowing which line of code generated the trap would require the analysis of DWARF information. This is not a feature we have today. As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops") the way WARN_ON() is implemented is suboptimal. That commit also mentions an issue with 'long long' condition. It fixed it for WARN_ON() but the same problem still exists today with BUG_ON() on PPC32. It will be fixed by using the generic implementation. By using the generic implementation, gcc will naturally generate a branch to the unconditional trap generated by BUG(). As modern powerpc implement zero-cycle branch, that's even more efficient. And for the functions using WARN_ON() and its return, the test on return from WARN_ON() is now also used for the WARN_ON() itself. On PPC64 we don't want it because we want to be able to use CFAR register to track how we entered the code that trapped. The CFAR register would be clobbered by the branch. A simple test function: unsigned long test9w(unsigned long a, unsigned long b) { if (WARN_ON(!b)) return 0; return a / b; } Before the patch: 046c : 46c: 7c 89 00 34 cntlzw r9,r4 470: 55 29 d9 7e rlwinm r9,r9,27,5,31 474: 0f 09 00 00 twnei r9,0 478: 2c 04 00 00 cmpwi r4,0 47c: 41 82 00 0c beq 488 480: 7c 63 23 96 divwu r3,r3,r4 484: 4e 80 00 20 blr 488: 38 60 00 00 li r3,0 48c: 4e 80 00 20 blr After the patch: 0468 : 468: 2c 04 00 00 cmpwi r4,0 46c: 41 82 00 0c beq 478 470: 7c 63 23 96 divwu r3,r3,r4 474: 4e 80 00 20 blr 478: 0f e0 00 00 twuir0,0 47c: 38 60 00 00 li r3,0 480: 4e 80 00 20 blr So we see before the patch we need 3 instructions on the likely path to handle the WARN_ON(). With the patch the trap goes on the unlikely path. See below the difference at the entry of system_call_exception where we have several BUG_ON(), allthough less impressing. With the patch: : 0: 81 6a 00 84 lwz r11,132(r10) 4: 90 6a 00 88 stw r3,136(r10) 8: 71 60 00 02 andi. r0,r11,2 c: 41 82 00 70 beq 7c 10: 71 60 40 00 andi. r0,r11,16384 14: 41 82 00 6c beq 80 18: 71 6b 80 00 andi. r11,r11,32768 1c: 41 82 00 68 beq 84 20: 94 21 ff e0 stwur1,-32(r1) 24: 93 e1 00 1c stw r31,28(r1) 28: 7d 8c 42 e6 mftbr12 ... 7c: 0f e0 00 00 twuir0,0 80: 0f e0 00 00 twuir0,0 84: 0f e0 00 00 twuir0,0 Without the patch: : 0: 94 21 ff e0 stwur1,-32(r1) 4: 93 e1 00 1c stw r31,28(r1) 8: 90 6a 00 88 stw r3,136(r10) c: 81 6a 00 84 lwz r11,132(r10) 10: 69 60 00 02 xorir0,r11,2 14: 54 00 ff fe rlwinm r0,r0,31,31,31 18: 0f 00 00 00 twnei r0,0 1c: 69 60 40 00 xorir0,r11,16384 20: 54 00 97 fe rlwinm r0,r0,18,31,31 24: 0f 00 00 00 twnei r0,0 28: 69 6b 80 00 xorir11,r11,32768 2c: 55 6b 8f fe rlwinm r11,r11,17,31,31 30: 0f 0b 00 00 twnei r11,0 34: 7d 8c 42 e6 mftbr12 Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/bug.h | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index d1635ffbb179..101dea4eec8d 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -68,7 +68,11 @@ BUG_ENTRY("twi 31, 0, 0", 0); \ unreachable(); \ } while (0) +#define HAVE_ARCH_BUG + +#define __WARN_FLAGS(flags) BUG_ENTRY("twi 3
[PATCH v2 3/3] powerpc/atomics: Remove atomic_inc()/atomic_dec() and friends
Now that atomic_add() and atomic_sub() handle immediate operands, atomic_inc() and atomic_dec() have no added value compared to the generic fallback which calls atomic_add(1) and atomic_sub(1). Also remove atomic_inc_not_zero() which fallsback to atomic_add_unless() which itself fallsback to atomic_fetch_add_unless() which now handles immediate operands. Signed-off-by: Christophe Leroy --- v2: New --- arch/powerpc/include/asm/atomic.h | 95 --- 1 file changed, 95 deletions(-) diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h index eb1bdf14f67c..00ba5d9e837b 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -118,71 +118,6 @@ ATOMIC_OPS(xor, xor, "", K) #undef ATOMIC_OP_RETURN_RELAXED #undef ATOMIC_OP -static __inline__ void atomic_inc(atomic_t *v) -{ - int t; - - __asm__ __volatile__( -"1:lwarx %0,0,%2 # atomic_inc\n\ - addic %0,%0,1\n" -" stwcx. %0,0,%2 \n\ - bne-1b" - : "=" (t), "+m" (v->counter) - : "r" (>counter) - : "cc", "xer"); -} -#define atomic_inc atomic_inc - -static __inline__ int atomic_inc_return_relaxed(atomic_t *v) -{ - int t; - - __asm__ __volatile__( -"1:lwarx %0,0,%2 # atomic_inc_return_relaxed\n" -" addic %0,%0,1\n" -" stwcx. %0,0,%2\n" -" bne-1b" - : "=" (t), "+m" (v->counter) - : "r" (>counter) - : "cc", "xer"); - - return t; -} - -static __inline__ void atomic_dec(atomic_t *v) -{ - int t; - - __asm__ __volatile__( -"1:lwarx %0,0,%2 # atomic_dec\n\ - addic %0,%0,-1\n" -" stwcx. %0,0,%2\n\ - bne-1b" - : "=" (t), "+m" (v->counter) - : "r" (>counter) - : "cc", "xer"); -} -#define atomic_dec atomic_dec - -static __inline__ int atomic_dec_return_relaxed(atomic_t *v) -{ - int t; - - __asm__ __volatile__( -"1:lwarx %0,0,%2 # atomic_dec_return_relaxed\n" -" addic %0,%0,-1\n" -" stwcx. %0,0,%2\n" -" bne-1b" - : "=" (t), "+m" (v->counter) - : "r" (>counter) - : "cc", "xer"); - - return t; -} - -#define atomic_inc_return_relaxed atomic_inc_return_relaxed -#define atomic_dec_return_relaxed atomic_dec_return_relaxed - #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n))) #define atomic_cmpxchg_relaxed(v, o, n) \ cmpxchg_relaxed(&((v)->counter), (o), (n)) @@ -252,36 +187,6 @@ static __inline__ int atomic_fetch_add_unless(atomic_t *v, int a, int u) } #define atomic_fetch_add_unless atomic_fetch_add_unless -/** - * atomic_inc_not_zero - increment unless the number is zero - * @v: pointer of type atomic_t - * - * Atomically increments @v by 1, so long as @v is non-zero. - * Returns non-zero if @v was non-zero, and zero otherwise. - */ -static __inline__ int atomic_inc_not_zero(atomic_t *v) -{ - int t1, t2; - - __asm__ __volatile__ ( - PPC_ATOMIC_ENTRY_BARRIER -"1:lwarx %0,0,%2 # atomic_inc_not_zero\n\ - cmpwi 0,%0,0\n\ - beq-2f\n\ - addic %1,%0,1\n" -" stwcx. %1,0,%2\n\ - bne-1b\n" - PPC_ATOMIC_EXIT_BARRIER - "\n\ -2:" - : "=" (t1), "=" (t2) - : "r" (>counter) - : "cc", "xer", "memory"); - - return t1; -} -#define atomic_inc_not_zero(v) atomic_inc_not_zero((v)) - /* * Atomically test *v and decrement if it is greater than 0. * The function returns the old value of *v minus 1, even if -- 2.25.0
[PATCH v2 1/3] powerpc/bitops: Use immediate operand when possible
Today we get the following code generation for bitops like set or clear bit: c0009fe0: 39 40 08 00 li r10,2048 c0009fe4: 7c e0 40 28 lwarx r7,0,r8 c0009fe8: 7c e7 53 78 or r7,r7,r10 c0009fec: 7c e0 41 2d stwcx. r7,0,r8 c000d568: 39 00 18 00 li r8,6144 c000d56c: 7c c0 38 28 lwarx r6,0,r7 c000d570: 7c c6 40 78 andcr6,r6,r8 c000d574: 7c c0 39 2d stwcx. r6,0,r7 Most set bits are constant on lower 16 bits, so it can easily be replaced by the "immediate" version of the operation. Allow GCC to choose between the normal or immediate form. For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for when all bits to be cleared are consecutive. For this we detect the number of transitions from 0 to 1 in the mask. This is done by handing the mask with its complement rotated left 1 bit. If this operation provides a number which is a power of 2, it means there is only one transition from 0 to 1 in the number, so all 1 bits are consecutives. Can't use rol32() which is not defined yet, so do a raw ((x << 1) | (x >> 31)). For the power of 2, can't use is_power_of_2() for the same reason, but it can also be easily encoded as (mask & (mask - 1)) and even the 0 case which is not a power of two is acceptable for us. On 64 bits we don't have any equivalent single operation, we'd need two 'rldicl' so it is not worth it. With this patch we get: c0009fe0: 7d 00 50 28 lwarx r8,0,r10 c0009fe4: 61 08 08 00 ori r8,r8,2048 c0009fe8: 7d 00 51 2d stwcx. r8,0,r10 c000d558: 7c e0 40 28 lwarx r7,0,r8 c000d55c: 54 e7 05 64 rlwinm r7,r7,0,21,18 c000d560: 7c e0 41 2d stwcx. r7,0,r8 On pmac32_defconfig, it reduces the text by approx 10 kbytes. Signed-off-by: Christophe Leroy --- v2: - Use "n" instead of "i" as constraint for the rlwinm mask - Improve mask verification to handle more than single bit masks --- arch/powerpc/include/asm/bitops.h | 85 --- 1 file changed, 77 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h index 299ab33505a6..baa7666c1094 100644 --- a/arch/powerpc/include/asm/bitops.h +++ b/arch/powerpc/include/asm/bitops.h @@ -71,19 +71,57 @@ static inline void fn(unsigned long mask, \ __asm__ __volatile__ ( \ prefix \ "1:" PPC_LLARX(%0,0,%3,0) "\n" \ - stringify_in_c(op) "%0,%0,%2\n" \ + #op "%I2 %0,%0,%2\n"\ PPC_STLCX "%0,0,%3\n" \ "bne- 1b\n" \ : "=" (old), "+m" (*p)\ - : "r" (mask), "r" (p) \ + : "rK" (mask), "r" (p) \ : "cc", "memory"); \ } DEFINE_BITOP(set_bits, or, "") -DEFINE_BITOP(clear_bits, andc, "") -DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER) DEFINE_BITOP(change_bits, xor, "") +static __always_inline bool is_rlwinm_mask_valid(unsigned long x) +{ + x = x & ~((x << 1) | (x >> 31));/* Flag transitions from 0 to 1 */ + + return !(x & (x - 1)); /* Is there only one transition */ +} + +#define DEFINE_CLROP(fn, prefix) \ +static inline void fn(unsigned long mask, volatile unsigned long *_p) \ +{ \ + unsigned long old; \ + unsigned long *p = (unsigned long *)_p; \ + \ + if (IS_ENABLED(CONFIG_PPC32) && \ + __builtin_constant_p(mask) && is_rlwinm_mask_valid(mask)) { \ + asm volatile ( \ + prefix \ + "1:""lwarx %0,0,%3\n" \ + "rlwinm %0,%0,0,%2\n" \ + "stwcx. %0,0,%3\n" \ + "bne- 1b\n" \ + : "=" (old), "+m" (*p)\ + : "n" (~mask), "r" (p) \ +
[PATCH v2 2/3] powerpc/atomics: Use immediate operand when possible
Today we get the following code generation for atomic operations: c001bb2c: 39 20 00 01 li r9,1 c001bb30: 7d 40 18 28 lwarx r10,0,r3 c001bb34: 7d 09 50 50 subfr8,r9,r10 c001bb38: 7d 00 19 2d stwcx. r8,0,r3 c001c7a8: 39 40 00 01 li r10,1 c001c7ac: 7d 00 18 28 lwarx r8,0,r3 c001c7b0: 7c ea 42 14 add r7,r10,r8 c001c7b4: 7c e0 19 2d stwcx. r7,0,r3 By allowing GCC to choose between immediate or regular operation, we get: c001bb2c: 7d 20 18 28 lwarx r9,0,r3 c001bb30: 39 49 ff ff addir10,r9,-1 c001bb34: 7d 40 19 2d stwcx. r10,0,r3 -- c001c7a4: 7d 40 18 28 lwarx r10,0,r3 c001c7a8: 39 0a 00 01 addir8,r10,1 c001c7ac: 7d 00 19 2d stwcx. r8,0,r3 For "and", the dot form has to be used because "andi" doesn't exist. For logical operations we use unsigned 16 bits immediate. For arithmetic operations we use signed 16 bits immediate. On pmac32_defconfig, it reduces the text by approx another 8 kbytes. Signed-off-by: Christophe Leroy Acked-by: Segher Boessenkool --- v2: Use "addc/addic" --- arch/powerpc/include/asm/atomic.h | 56 +++ 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h index 61c6e8b200e8..eb1bdf14f67c 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -37,62 +37,62 @@ static __inline__ void atomic_set(atomic_t *v, int i) __asm__ __volatile__("stw%U0%X0 %1,%0" : "=m"UPD_CONSTR(v->counter) : "r"(i)); } -#define ATOMIC_OP(op, asm_op) \ +#define ATOMIC_OP(op, asm_op, suffix, sign, ...) \ static __inline__ void atomic_##op(int a, atomic_t *v) \ { \ int t; \ \ __asm__ __volatile__( \ "1:lwarx %0,0,%3 # atomic_" #op "\n" \ - #asm_op " %0,%2,%0\n" \ + #asm_op "%I2" suffix " %0,%0,%2\n" \ " stwcx. %0,0,%3 \n" \ " bne-1b\n" \ : "=" (t), "+m" (v->counter) \ - : "r" (a), "r" (>counter)\ - : "cc");\ + : "r"#sign (a), "r" (>counter) \ + : "cc", ##__VA_ARGS__); \ } \ -#define ATOMIC_OP_RETURN_RELAXED(op, asm_op) \ +#define ATOMIC_OP_RETURN_RELAXED(op, asm_op, suffix, sign, ...) \ static inline int atomic_##op##_return_relaxed(int a, atomic_t *v) \ { \ int t; \ \ __asm__ __volatile__( \ "1:lwarx %0,0,%3 # atomic_" #op "_return_relaxed\n" \ - #asm_op " %0,%2,%0\n" \ + #asm_op "%I2" suffix " %0,%0,%2\n" \ " stwcx. %0,0,%3\n" \ " bne-1b\n" \ : "=" (t), "+m" (v->counter) \ - : "r" (a), "r" (>counter)\ - : "cc");\ + : "r"#sign (a), "r" (>counter) \ + : "cc", ##__VA_ARGS__); \ \ return t; \ } -#define ATOMIC_FETCH_OP_RELAXED(op, asm_op)\ +#de
Re: [PATCH v1 2/2] powerpc/atomics: Use immediate operand when possible
Le 13/04/2021 à 00:08, Segher Boessenkool a écrit : Hi! On Thu, Apr 08, 2021 at 03:33:45PM +, Christophe Leroy wrote: +#define ATOMIC_OP(op, asm_op, dot, sign) \ static __inline__ void atomic_##op(int a, atomic_t *v) \ { \ int t; \ \ __asm__ __volatile__( \ "1: lwarx %0,0,%3 # atomic_" #op "\n" \ - #asm_op " %0,%2,%0\n" \ + #asm_op "%I2" dot " %0,%0,%2\n" \ "stwcx. %0,0,%3 \n"\ "bne-1b\n" \ - : "=" (t), "+m" (v->counter) \ - : "r" (a), "r" (>counter) \ + : "=" (t), "+m" (v->counter) \ + : "r"#sign (a), "r" (>counter)\ : "cc"); \ } \ You need "b" (instead of "r") only for "addi". You can use "addic" instead, which clobbers XER[CA], but *all* inline asm does, so that is not a downside here (it is also not slower on any CPU that matters). @@ -238,14 +238,14 @@ static __inline__ int atomic_fetch_add_unless(atomic_t *v, int a, int u) "1: lwarx %0,0,%1 # atomic_fetch_add_unless\n\ cmpw0,%0,%3 \n\ beq 2f \n\ - add %0,%2,%0 \n" + add%I2 %0,%0,%2 \n" "stwcx. %0,0,%1 \n\ bne-1b \n" PPC_ATOMIC_EXIT_BARRIER -" subf%0,%2,%0 \n\ +" sub%I2 %0,%0,%2 \n\ 2:" - : "=" (t) - : "r" (>counter), "r" (a), "r" (u) + : "=" (t) + : "r" (>counter), "rI" (a), "r" (u) : "cc", "memory"); Same here. Yes, I thought about addic, I didn't find an early solution because I forgot the matching 'addc'. Now with the couple addc/addic it works well. Thanks Nice patches! Acked-by: Segher Boessenkool Christophe
Re: [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible
Le 12/04/2021 à 23:54, Segher Boessenkool a écrit : Hi! On Thu, Apr 08, 2021 at 03:33:44PM +, Christophe Leroy wrote: For clear bits, on 32 bits 'rlwinm' can be used instead or 'andc' for when all bits to be cleared are consecutive. Also on 64-bits, as long as both the top and bottom bits are in the low 32-bit half (for 32 bit mode, it can wrap as well). Yes. But here we are talking about clearing a few bits, all other ones must remain unchanged. An rlwinm on PPC64 will always clear the upper part, which is unlikely what we want. For the time being only handle the single bit case, which we detect by checking whether the mask is a power of two. You could look at rs6000_is_valid_mask in GCC: <https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/rs6000/rs6000.c;h=48b8efd732b251c059628096314848305deb0c0b;hb=HEAD#l11148> used by rs6000_is_valid_and_mask immediately after it. You probably want to allow only rlwinm in your case, and please note this checks if something is a valid mask, not the inverse of a valid mask (as you want here). This check looks more complex than what I need. It is used for both rlw... and rld..., and it calculates the operants. The only thing I need is to validate the mask. I found a way: By anding the mask with the complement of itself rotated by left bits to 1, we identify the transitions from 0 to 1. If the result is a power of 2, it means there's only one transition so the mask is as expected. So I did that in v2. So yes this is pretty involved :-) Your patch looks good btw. But please use "n", not "i", as constraint? Done. Christophe
[PATCH 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
Using asm goto in __WARN_FLAGS() and WARN_ON() allows more flexibility to GCC. For that add an entry to the exception table so that program_check_exception() knowns where to resume execution after a WARNING. Here are two exemples. The first one is done on PPC32 (which benefits from the previous patch), the second is on PPC64. unsigned long test(struct pt_regs *regs) { int ret; WARN_ON(regs->msr & MSR_PR); return regs->gpr[3]; } unsigned long test9w(unsigned long a, unsigned long b) { if (WARN_ON(!b)) return 0; return a / b; } Before the patch: 03a8 : 3a8: 81 23 00 84 lwz r9,132(r3) 3ac: 71 29 40 00 andi. r9,r9,16384 3b0: 40 82 00 0c bne 3bc 3b4: 80 63 00 0c lwz r3,12(r3) 3b8: 4e 80 00 20 blr 3bc: 0f e0 00 00 twuir0,0 3c0: 80 63 00 0c lwz r3,12(r3) 3c4: 4e 80 00 20 blr 0bf0 <.test9w>: bf0: 7c 89 00 74 cntlzd r9,r4 bf4: 79 29 d1 82 rldicl r9,r9,58,6 bf8: 0b 09 00 00 tdnei r9,0 bfc: 2c 24 00 00 cmpdi r4,0 c00: 41 82 00 0c beq c0c <.test9w+0x1c> c04: 7c 63 23 92 divdu r3,r3,r4 c08: 4e 80 00 20 blr c0c: 38 60 00 00 li r3,0 c10: 4e 80 00 20 blr After the patch: 03a8 : 3a8: 81 23 00 84 lwz r9,132(r3) 3ac: 71 29 40 00 andi. r9,r9,16384 3b0: 40 82 00 0c bne 3bc 3b4: 80 63 00 0c lwz r3,12(r3) 3b8: 4e 80 00 20 blr 3bc: 0f e0 00 00 twuir0,0 0c50 <.test9w>: c50: 7c 89 00 74 cntlzd r9,r4 c54: 79 29 d1 82 rldicl r9,r9,58,6 c58: 0b 09 00 00 tdnei r9,0 c5c: 7c 63 23 92 divdu r3,r3,r4 c60: 4e 80 00 20 blr c70: 38 60 00 00 li r3,0 c74: 4e 80 00 20 blr In the first exemple, we see GCC doesn't need to duplicate what happens after the trap. In the second exemple, we see that GCC doesn't need to emit a test and a branch in the likely path in addition to the trap. We've got some WARN_ON() in .softirqentry.text section so it needs to be added in the OTHER_TEXT_SECTIONS in modpost.c Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/64/kup.h | 2 +- arch/powerpc/include/asm/bug.h | 51 +++- arch/powerpc/include/asm/extable.h | 14 +++ arch/powerpc/include/asm/ppc_asm.h | 11 + arch/powerpc/kernel/entry_64.S | 2 +- arch/powerpc/kernel/exceptions-64e.S | 2 +- arch/powerpc/kernel/misc_32.S| 2 +- arch/powerpc/kernel/traps.c | 9 - scripts/mod/modpost.c| 2 +- 9 files changed, 69 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index 9700da3a4093..a22839cba32e 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -90,7 +90,7 @@ /* Prevent access to userspace using any key values */ LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED) 999: tdne\gpr1, \gpr2 - EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE) + EMIT_WARN_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE) END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_BOOK3S_KUAP, 67) #endif .endm diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index 101dea4eec8d..d92afdbd4449 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -4,6 +4,7 @@ #ifdef __KERNEL__ #include +#include #ifdef CONFIG_BUG @@ -30,6 +31,11 @@ .endm #endif /* verbose */ +.macro EMIT_WARN_ENTRY addr,file,line,flags + EX_TABLE(\addr,\addr+4) + EMIT_BUG_ENTRY \addr,\file,\line,\flags +.endm + #else /* !__ASSEMBLY__ */ /* _EMIT_BUG_ENTRY expects args %0,%1,%2,%3 to be FILE, LINE, flags and sizeof(struct bug_entry), respectively */ @@ -58,6 +64,16 @@ "i" (sizeof(struct bug_entry)), \ ##__VA_ARGS__) +#define WARN_ENTRY(insn, flags, label, ...)\ + asm_volatile_goto( \ + "1: " insn "\n" \ + EX_TABLE(1b, %l[label]) \ + _EMIT_BUG_ENTRY \ + : : "i" (__FILE__), "i" (__LINE__), \ + "i" (flags), \ + "i&quo
[PATCH 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
powerpc BUG_ON() and WARN_ON() are based on using twnei instruction. For catching simple conditions like a variable having value 0, this is efficient because it does the test and the trap at the same time. But most conditions used with BUG_ON or WARN_ON are more complex and forces GCC to format the condition into a 0 or 1 value in a register. This will usually require 2 to 3 instructions. The most efficient solution would be to use __builtin_trap() because GCC is able to optimise the use of the different trap instructions based on the requested condition, but this is complex if not impossible for the following reasons: - __builtin_trap() is a non-recoverable instruction, so it can't be used for WARN_ON - Knowing which line of code generated the trap would require the analysis of DWARF information. This is not a feature we have today. As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops") the way WARN_ON() is implemented is suboptimal. That commit also mentions an issue with 'long long' condition. It fixed it for WARN_ON() but the same problem still exists today with BUG_ON() on PPC32. It will be fixed by using the generic implementation. By using the generic implementation, gcc will naturally generate a branch to the unconditional trap generated by BUG(). As modern powerpc implement zero-cycle branch, that's even more efficient. And for the functions using WARN_ON() and its return, the test on return from WARN_ON() is now also used for the WARN_ON() itself. On PPC64 we don't want it because we want to be able to use CFAR register to track how we entered the code that trapped. The CFAR register would be clobbered by the branch. A simple test function: unsigned long test9w(unsigned long a, unsigned long b) { if (WARN_ON(!b)) return 0; return a / b; } Before the patch: 046c : 46c: 7c 89 00 34 cntlzw r9,r4 470: 55 29 d9 7e rlwinm r9,r9,27,5,31 474: 0f 09 00 00 twnei r9,0 478: 2c 04 00 00 cmpwi r4,0 47c: 41 82 00 0c beq 488 480: 7c 63 23 96 divwu r3,r3,r4 484: 4e 80 00 20 blr 488: 38 60 00 00 li r3,0 48c: 4e 80 00 20 blr After the patch: 0468 : 468: 2c 04 00 00 cmpwi r4,0 46c: 41 82 00 0c beq 478 470: 7c 63 23 96 divwu r3,r3,r4 474: 4e 80 00 20 blr 478: 0f e0 00 00 twuir0,0 47c: 38 60 00 00 li r3,0 480: 4e 80 00 20 blr So we see before the patch we need 3 instructions on the likely path to handle the WARN_ON(). With the patch the trap goes on the unlikely path. See below the difference at the entry of system_call_exception where we have several BUG_ON(), allthough less impressing. With the patch: : 0: 81 6a 00 84 lwz r11,132(r10) 4: 90 6a 00 88 stw r3,136(r10) 8: 71 60 00 02 andi. r0,r11,2 c: 41 82 00 70 beq 7c 10: 71 60 40 00 andi. r0,r11,16384 14: 41 82 00 6c beq 80 18: 71 6b 80 00 andi. r11,r11,32768 1c: 41 82 00 68 beq 84 20: 94 21 ff e0 stwur1,-32(r1) 24: 93 e1 00 1c stw r31,28(r1) 28: 7d 8c 42 e6 mftbr12 ... 7c: 0f e0 00 00 twuir0,0 80: 0f e0 00 00 twuir0,0 84: 0f e0 00 00 twuir0,0 Without the patch: : 0: 94 21 ff e0 stwur1,-32(r1) 4: 93 e1 00 1c stw r31,28(r1) 8: 90 6a 00 88 stw r3,136(r10) c: 81 6a 00 84 lwz r11,132(r10) 10: 69 60 00 02 xorir0,r11,2 14: 54 00 ff fe rlwinm r0,r0,31,31,31 18: 0f 00 00 00 twnei r0,0 1c: 69 60 40 00 xorir0,r11,16384 20: 54 00 97 fe rlwinm r0,r0,18,31,31 24: 0f 00 00 00 twnei r0,0 28: 69 6b 80 00 xorir11,r11,32768 2c: 55 6b 8f fe rlwinm r11,r11,17,31,31 30: 0f 0b 00 00 twnei r11,0 34: 7d 8c 42 e6 mftbr12 Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/bug.h | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h index d1635ffbb179..101dea4eec8d 100644 --- a/arch/powerpc/include/asm/bug.h +++ b/arch/powerpc/include/asm/bug.h @@ -68,7 +68,11 @@ BUG_ENTRY("twi 31, 0, 0", 0); \ unreachable(); \ } while (0) +#define HAVE_ARCH_BUG + +#define __WARN_FLAGS(flags) BUG_ENTRY("twi 3
[PATCH 3/3] powerpc/ebpf32: Use standard function call for functions within 32M distance
If the target of a function call is within 32 Mbytes distance, use a standard function call with 'bl' of the 'lis/ori/mtlr/blrl' sequence. In the first pass, no memory has been allocated yet and the code position is not known yet (image pointer is NULL). This pass is there to calculate the amount of memory to allocate for the EBPF code, so assume the 4 instructions sequence is required, so that enough memory is allocated. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/net/bpf_jit.h| 3 +++ arch/powerpc/net/bpf_jit_comp32.c | 16 +++- 3 files changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 5b60020dc1f4..ac41776661e9 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -265,6 +265,7 @@ #define PPC_INST_ORI 0x6000 #define PPC_INST_ORIS 0x6400 #define PPC_INST_BRANCH0x4800 +#define PPC_INST_BL0x4801 #define PPC_INST_BRANCH_COND 0x4080 /* Prefixes */ diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index 776abef4d2a0..99fad093f43e 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -26,6 +26,9 @@ /* Long jump; (unconditional 'branch') */ #define PPC_JMP(dest) EMIT(PPC_INST_BRANCH |\ (((dest) - (ctx->idx * 4)) & 0x03fc)) +/* blr; (unconditional 'branch' with link) to absolute address */ +#define PPC_BL_ABS(dest) EMIT(PPC_INST_BL |\ +(((dest) - (unsigned long)(image + ctx->idx)) & 0x03fc)) /* "cond" here covers BO:BI fields. */ #define PPC_BCC_SHORT(cond, dest) EMIT(PPC_INST_BRANCH_COND | \ (((cond) & 0x3ff) << 16) | \ diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index ef21b09df76e..bbb16099e8c7 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -187,11 +187,17 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx) void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func) { - /* Load function address into r0 */ - EMIT(PPC_RAW_LIS(__REG_R0, IMM_H(func))); - EMIT(PPC_RAW_ORI(__REG_R0, __REG_R0, IMM_L(func))); - EMIT(PPC_RAW_MTLR(__REG_R0)); - EMIT(PPC_RAW_BLRL()); + s32 rel = (s32)func - (s32)(image + ctx->idx); + + if (image && rel < 0x200 && rel >= -0x200) { + PPC_BL_ABS(func); + } else { + /* Load function address into r0 */ + EMIT(PPC_RAW_LIS(__REG_R0, IMM_H(func))); + EMIT(PPC_RAW_ORI(__REG_R0, __REG_R0, IMM_L(func))); + EMIT(PPC_RAW_MTLR(__REG_R0)); + EMIT(PPC_RAW_BLRL()); + } } static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out) -- 2.25.0
[PATCH 2/3] powerpc/ebpf32: Rework 64 bits shifts to avoid tests and branches
Re-implement BPF_ALU64 | BPF_{LSH/RSH/ARSH} | BPF_X with branchless implementation copied from misc_32.S. Signed-off-by: Christophe Leroy --- arch/powerpc/net/bpf_jit_comp32.c | 39 +++ 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index ca6fe1583460..ef21b09df76e 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -548,16 +548,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * EMIT(PPC_RAW_SLW(dst_reg, dst_reg, src_reg)); break; case BPF_ALU64 | BPF_LSH | BPF_X: /* dst <<= src; */ - EMIT(PPC_RAW_ADDIC_DOT(__REG_R0, src_reg, -32)); - PPC_BCC_SHORT(COND_LT, (ctx->idx + 4) * 4); - EMIT(PPC_RAW_SLW(dst_reg_h, dst_reg, __REG_R0)); - EMIT(PPC_RAW_LI(dst_reg, 0)); - PPC_JMP((ctx->idx + 6) * 4); + bpf_set_seen_register(ctx, tmp_reg); EMIT(PPC_RAW_SUBFIC(__REG_R0, src_reg, 32)); EMIT(PPC_RAW_SLW(dst_reg_h, dst_reg_h, src_reg)); + EMIT(PPC_RAW_ADDI(tmp_reg, src_reg, 32)); EMIT(PPC_RAW_SRW(__REG_R0, dst_reg, __REG_R0)); - EMIT(PPC_RAW_SLW(dst_reg, dst_reg, src_reg)); + EMIT(PPC_RAW_SLW(tmp_reg, dst_reg, tmp_reg)); EMIT(PPC_RAW_OR(dst_reg_h, dst_reg_h, __REG_R0)); + EMIT(PPC_RAW_SLW(dst_reg, dst_reg, src_reg)); + EMIT(PPC_RAW_OR(dst_reg_h, dst_reg_h, tmp_reg)); break; case BPF_ALU | BPF_LSH | BPF_K: /* (u32) dst <<= (u32) imm */ if (!imm) @@ -585,16 +584,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * EMIT(PPC_RAW_SRW(dst_reg, dst_reg, src_reg)); break; case BPF_ALU64 | BPF_RSH | BPF_X: /* dst >>= src */ - EMIT(PPC_RAW_ADDIC_DOT(__REG_R0, src_reg, -32)); - PPC_BCC_SHORT(COND_LT, (ctx->idx + 4) * 4); - EMIT(PPC_RAW_SRW(dst_reg, dst_reg_h, __REG_R0)); - EMIT(PPC_RAW_LI(dst_reg_h, 0)); - PPC_JMP((ctx->idx + 6) * 4); - EMIT(PPC_RAW_SUBFIC(0, src_reg, 32)); + bpf_set_seen_register(ctx, tmp_reg); + EMIT(PPC_RAW_SUBFIC(__REG_R0, src_reg, 32)); EMIT(PPC_RAW_SRW(dst_reg, dst_reg, src_reg)); + EMIT(PPC_RAW_ADDI(tmp_reg, src_reg, 32)); EMIT(PPC_RAW_SLW(__REG_R0, dst_reg_h, __REG_R0)); - EMIT(PPC_RAW_SRW(dst_reg_h, dst_reg_h, src_reg)); + EMIT(PPC_RAW_SRW(tmp_reg, dst_reg_h, tmp_reg)); EMIT(PPC_RAW_OR(dst_reg, dst_reg, __REG_R0)); + EMIT(PPC_RAW_SRW(dst_reg_h, dst_reg_h, src_reg)); + EMIT(PPC_RAW_OR(dst_reg, dst_reg, tmp_reg)); break; case BPF_ALU | BPF_RSH | BPF_K: /* (u32) dst >>= (u32) imm */ if (!imm) @@ -622,16 +620,17 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * EMIT(PPC_RAW_SRAW(dst_reg_h, dst_reg, src_reg)); break; case BPF_ALU64 | BPF_ARSH | BPF_X: /* (s64) dst >>= src */ - EMIT(PPC_RAW_ADDIC_DOT(__REG_R0, src_reg, -32)); - PPC_BCC_SHORT(COND_LT, (ctx->idx + 4) * 4); - EMIT(PPC_RAW_SRAW(dst_reg, dst_reg_h, __REG_R0)); - EMIT(PPC_RAW_SRAWI(dst_reg_h, dst_reg_h, 31)); - PPC_JMP((ctx->idx + 6) * 4); - EMIT(PPC_RAW_SUBFIC(0, src_reg, 32)); + bpf_set_seen_register(ctx, tmp_reg); + EMIT(PPC_RAW_SUBFIC(__REG_R0, src_reg, 32)); EMIT(PPC_RAW_SRW(dst_reg, dst_reg, src_reg)); EMIT(PPC_RAW_SLW(__REG_R0, dst_reg_h, __REG_R0)); - EMIT(PPC_RAW_SRAW(dst_reg_h, dst_reg_h, src_reg)); + EMIT(PPC_RAW_ADDI(tmp_reg, src_reg, 32)); EMIT(PPC_RAW_OR(dst_reg, dst_reg, __REG_R0)); + EMIT(PPC_RAW_RLWINM(__REG_R0, tmp_reg, 0, 26, 26)); + EMIT(PPC_RAW_SRAW(tmp_reg, dst_reg_h, tmp_reg)); + EMIT(PPC_RAW_SRAW(dst_reg_h, dst_reg_h, src_reg)); + EMIT(PPC_RAW_SLW(tmp_reg, tmp_reg, __REG_R0)); + EMIT(PPC_RAW_OR(dst_reg, dst_reg, tmp_reg));
[PATCH 1/3] powerpc/ebpf32: Fix comment on BPF_ALU{64} | BPF_LSH | BPF_K
Replace <<== by <<= Signed-off-by: Christophe Leroy --- arch/powerpc/net/bpf_jit_comp32.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index 003843273b43..ca6fe1583460 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -559,12 +559,12 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * EMIT(PPC_RAW_SLW(dst_reg, dst_reg, src_reg)); EMIT(PPC_RAW_OR(dst_reg_h, dst_reg_h, __REG_R0)); break; - case BPF_ALU | BPF_LSH | BPF_K: /* (u32) dst <<== (u32) imm */ + case BPF_ALU | BPF_LSH | BPF_K: /* (u32) dst <<= (u32) imm */ if (!imm) break; EMIT(PPC_RAW_SLWI(dst_reg, dst_reg, imm)); break; - case BPF_ALU64 | BPF_LSH | BPF_K: /* dst <<== imm */ + case BPF_ALU64 | BPF_LSH | BPF_K: /* dst <<= imm */ if (imm < 0) return -EINVAL; if (!imm) -- 2.25.0
[PATCH v1 4/4] powerpc: Move copy_from_kernel_nofault_inst()
When probe_kernel_read_inst() was created, there was no good place to put it, so a file called lib/inst.c was dedicated for it. Since then, probe_kernel_read_inst() has been renamed copy_from_kernel_nofault_inst(). And mm/maccess.h didn't exist at that time. Today, mm/maccess.h is related to copy_from_kernel_nofault(). Move copy_from_kernel_nofault_inst() into mm/maccess.c Signed-off-by: Christophe Leroy --- arch/powerpc/lib/inst.c | 26 -- arch/powerpc/mm/maccess.c | 21 + 2 files changed, 21 insertions(+), 26 deletions(-) delete mode 100644 arch/powerpc/lib/inst.c diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c deleted file mode 100644 index ec7f6bae8b3c.. --- a/arch/powerpc/lib/inst.c +++ /dev/null @@ -1,26 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * Copyright 2020, IBM Corporation. - */ - -#include -#include -#include -#include - -int copy_from_kernel_nofault_inst(struct ppc_inst *inst, struct ppc_inst *src) -{ - unsigned int val, suffix; - int err; - - err = copy_from_kernel_nofault(, src, sizeof(val)); - if (err) - return err; - if (IS_ENABLED(CONFIG_PPC64) && get_op(val) == OP_PREFIX) { - err = copy_from_kernel_nofault(, (void *)src + 4, 4); - *inst = ppc_inst_prefix(val, suffix); - } else { - *inst = ppc_inst(val); - } - return err; -} diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c index fa9a7a718fc6..e75e74c52a8a 100644 --- a/arch/powerpc/mm/maccess.c +++ b/arch/powerpc/mm/maccess.c @@ -3,7 +3,28 @@ #include #include +#include +#include +#include + bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) { return is_kernel_addr((unsigned long)unsafe_src); } + +int copy_from_kernel_nofault_inst(struct ppc_inst *inst, struct ppc_inst *src) +{ + unsigned int val, suffix; + int err; + + err = copy_from_kernel_nofault(, src, sizeof(val)); + if (err) + return err; + if (IS_ENABLED(CONFIG_PPC64) && get_op(val) == OP_PREFIX) { + err = copy_from_kernel_nofault(, (void *)src + 4, 4); + *inst = ppc_inst_prefix(val, suffix); + } else { + *inst = ppc_inst(val); + } + return err; +} -- 2.25.0
[PATCH v1 3/4] powerpc: Rename probe_kernel_read_inst()
When probe_kernel_read_inst() was created, it was to mimic probe_kernel_read() function. Since then, probe_kernel_read() has been renamed copy_from_kernel_nofault(). Rename probe_kernel_read_inst() into copy_from_kernel_nofault_inst(). Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/inst.h| 3 +-- arch/powerpc/kernel/align.c| 2 +- arch/powerpc/kernel/trace/ftrace.c | 22 +++--- arch/powerpc/lib/inst.c| 3 +-- 4 files changed, 14 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index a40c3913a4a3..a8ab0715f50e 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -177,7 +177,6 @@ static inline char *__ppc_inst_as_str(char str[PPC_INST_STR_LEN], struct ppc_ins __str; \ }) -int probe_kernel_read_inst(struct ppc_inst *inst, - struct ppc_inst *src); +int copy_from_kernel_nofault_inst(struct ppc_inst *inst, struct ppc_inst *src); #endif /* _ASM_POWERPC_INST_H */ diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index a97d5f1a3905..df3b55fec27d 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -311,7 +311,7 @@ int fix_alignment(struct pt_regs *regs) CHECK_FULL_REGS(regs); if (is_kernel_addr(regs->nip)) - r = probe_kernel_read_inst(, (void *)regs->nip); + r = copy_from_kernel_nofault_inst(, (void *)regs->nip); else r = __get_user_instr(instr, (void __user *)regs->nip); diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 42761ebec9f7..9daa4eb812ce 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -68,7 +68,7 @@ ftrace_modify_code(unsigned long ip, struct ppc_inst old, struct ppc_inst new) */ /* read the text we want to modify */ - if (probe_kernel_read_inst(, (void *)ip)) + if (copy_from_kernel_nofault_inst(, (void *)ip)) return -EFAULT; /* Make sure it is what we expect it to be */ @@ -130,7 +130,7 @@ __ftrace_make_nop(struct module *mod, struct ppc_inst op, pop; /* read where this goes */ - if (probe_kernel_read_inst(, (void *)ip)) { + if (copy_from_kernel_nofault_inst(, (void *)ip)) { pr_err("Fetching opcode failed.\n"); return -EFAULT; } @@ -164,7 +164,7 @@ __ftrace_make_nop(struct module *mod, /* When using -mkernel_profile there is no load to jump over */ pop = ppc_inst(PPC_INST_NOP); - if (probe_kernel_read_inst(, (void *)(ip - 4))) { + if (copy_from_kernel_nofault_inst(, (void *)(ip - 4))) { pr_err("Fetching instruction at %lx failed.\n", ip - 4); return -EFAULT; } @@ -197,7 +197,7 @@ __ftrace_make_nop(struct module *mod, * Check what is in the next instruction. We can see ld r2,40(r1), but * on first pass after boot we will see mflr r0. */ - if (probe_kernel_read_inst(, (void *)(ip + 4))) { + if (copy_from_kernel_nofault_inst(, (void *)(ip + 4))) { pr_err("Fetching op failed.\n"); return -EFAULT; } @@ -349,7 +349,7 @@ static int setup_mcount_compiler_tramp(unsigned long tramp) return -1; /* New trampoline -- read where this goes */ - if (probe_kernel_read_inst(, (void *)tramp)) { + if (copy_from_kernel_nofault_inst(, (void *)tramp)) { pr_debug("Fetching opcode failed.\n"); return -1; } @@ -399,7 +399,7 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr) struct ppc_inst op; /* Read where this goes */ - if (probe_kernel_read_inst(, (void *)ip)) { + if (copy_from_kernel_nofault_inst(, (void *)ip)) { pr_err("Fetching opcode failed.\n"); return -EFAULT; } @@ -526,10 +526,10 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) struct module *mod = rec->arch.mod; /* read where this goes */ - if (probe_kernel_read_inst(op, ip)) + if (copy_from_kernel_nofault_inst(op, ip)) return -EFAULT; - if (probe_kernel_read_inst(op + 1, ip + 4)) + if (copy_from_kernel_nofault_inst(op + 1, ip + 4)) return -EFAULT; if (!expected_nop_sequence(ip, op[0], op[1])) { @@ -592,7 +592,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) unsigned long ip = rec->ip; /* read where this goes */ - if (probe_kernel_read_inst(, (void *)ip)) + if (copy_from_kernel_nofault_inst(, (void *)ip)) return -EFAULT; /* It should be pointing
[PATCH v1 2/4] powerpc: Make probe_kernel_read_inst() common to PPC32 and PPC64
We have two independant versions of probe_kernel_read_inst(), one for PPC32 and one for PPC64. The PPC32 is identical to the first part of the PPC64 version. The remaining part of PPC64 version is not relevant for PPC32, but not contradictory, so we can easily have a common function with the PPC64 part opted out via a IS_ENABLED(CONFIG_PPC64). The only need is to add a version of ppc_inst_prefix() for PPC32. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/inst.h | 2 ++ arch/powerpc/lib/inst.c | 17 + 2 files changed, 3 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index 2902d4e6a363..a40c3913a4a3 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -102,6 +102,8 @@ static inline bool ppc_inst_equal(struct ppc_inst x, struct ppc_inst y) #define ppc_inst(x) ((struct ppc_inst){ .val = x }) +#define ppc_inst_prefix(x, y) ppc_inst(x) + static inline bool ppc_inst_prefixed(struct ppc_inst x) { return false; diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c index c57b3548de37..0dff3ac2d45f 100644 --- a/arch/powerpc/lib/inst.c +++ b/arch/powerpc/lib/inst.c @@ -8,7 +8,6 @@ #include #include -#ifdef CONFIG_PPC64 int probe_kernel_read_inst(struct ppc_inst *inst, struct ppc_inst *src) { @@ -18,7 +17,7 @@ int probe_kernel_read_inst(struct ppc_inst *inst, err = copy_from_kernel_nofault(, src, sizeof(val)); if (err) return err; - if (get_op(val) == OP_PREFIX) { + if (IS_ENABLED(CONFIG_PPC64) && get_op(val) == OP_PREFIX) { err = copy_from_kernel_nofault(, (void *)src + 4, 4); *inst = ppc_inst_prefix(val, suffix); } else { @@ -26,17 +25,3 @@ int probe_kernel_read_inst(struct ppc_inst *inst, } return err; } -#else /* !CONFIG_PPC64 */ -int probe_kernel_read_inst(struct ppc_inst *inst, - struct ppc_inst *src) -{ - unsigned int val; - int err; - - err = copy_from_kernel_nofault(, src, sizeof(val)); - if (!err) - *inst = ppc_inst(val); - - return err; -} -#endif /* CONFIG_PPC64 */ -- 2.25.0