Re: [PATCH v6 0/3] Introduce 64b relocatable kernel
On Thu, 17 Jun 2021 06:33:48 PDT (-0700), a...@ghiti.fr wrote: Le 18/05/2021 à 12:12, Alexandre Ghiti a écrit : After multiple attempts, this patchset is now based on the fact that the 64b kernel mapping was moved outside the linear mapping. The first patch allows to build relocatable kernels but is not selected by default. That patch should ease KASLR implementation a lot. The second and third patches take advantage of an already existing powerpc script that checks relocations at compile-time, and uses it for riscv. @Palmer, any thought about that? There are no users for now, do you want to wait for a KASLR implementation to use it before merging this? If so, I can work on a KASLR implementation based on older implementation from Zong. Sorry, I must have missed this patch set the first time through. I don't see any reason to wait for KASLR before taking support for relocatable kernels, as relocatable kernelsa are useful on their own. I'm not sure I'll have time to look at this for this cycle, but I'll try to find some time to given that it was posted a while ago. Thanks, This patchset was tested on: * kernel: - rv32: OK - rv64 with RELOCATABLE: OK and checked that "suspicious" relocations are caught. - rv64 without RELOCATABLE: OK - powerpc: build only and checked that "suspicious" relocations are caught. * xipkernel: - rv32: build only - rv64: OK * nommukernel: - rv64: build only Changes in v6: * Remove the kernel move to vmalloc zone * Rebased on top of for-next * Remove relocatable property from 32b kernel as the kernel is mapped in the linear mapping and would then need to be copied physically too * CONFIG_RELOCATABLE depends on !XIP_KERNEL * Remove Reviewed-by from first patch as it changed a bit Changes in v5: * Add "static __init" to create_kernel_page_table function as reported by Kbuild test robot * Add reviewed-by from Zong * Rebase onto v5.7 Changes in v4: * Fix BPF region that overlapped with kernel's as suggested by Zong * Fix end of module region that could be larger than 2GB as suggested by Zong * Fix the size of the vm area reserved for the kernel as we could lose PMD_SIZE if the size was already aligned on PMD_SIZE * Split compile time relocations check patch into 2 patches as suggested by Anup * Applied Reviewed-by from Zong and Anup Changes in v3: * Move kernel mapping to vmalloc Changes in v2: * Make RELOCATABLE depend on MMU as suggested by Anup * Rename kernel_load_addr into kernel_virt_addr as suggested by Anup * Use __pa_symbol instead of __pa, as suggested by Zong * Rebased on top of v5.6-rc3 * Tested with sv48 patchset * Add Reviewed/Tested-by from Zong and Anup Alexandre Ghiti (3): riscv: Introduce CONFIG_RELOCATABLE powerpc: Move script to check relocations at compile time in scripts/ riscv: Check relocations at compile time arch/powerpc/tools/relocs_check.sh | 18 ++ arch/riscv/Kconfig | 12 +++ arch/riscv/Makefile| 5 ++- arch/riscv/Makefile.postlink | 36 arch/riscv/kernel/vmlinux.lds.S| 6 arch/riscv/mm/Makefile | 4 +++ arch/riscv/mm/init.c | 53 +- arch/riscv/tools/relocs_check.sh | 26 +++ scripts/relocs_check.sh| 20 +++ 9 files changed, 162 insertions(+), 18 deletions(-) create mode 100644 arch/riscv/Makefile.postlink create mode 100755 arch/riscv/tools/relocs_check.sh create mode 100755 scripts/relocs_check.sh
Re: [RESEND PATCH v4 05/11] powerpc/64s: Add ability to skip SLB preload
On Wed Jun 30, 2021 at 11:15 PM CDT, Nicholas Piggin wrote: > Excerpts from Christopher M. Riedl's message of July 1, 2021 1:48 pm: > > On Sun Jun 20, 2021 at 10:13 PM CDT, Daniel Axtens wrote: > >> "Christopher M. Riedl" writes: > >> > >> > Switching to a different mm with Hash translation causes SLB entries to > >> > be preloaded from the current thread_info. This reduces SLB faults, for > >> > example when threads share a common mm but operate on different address > >> > ranges. > >> > > >> > Preloading entries from the thread_info struct may not always be > >> > appropriate - such as when switching to a temporary mm. Introduce a new > >> > boolean in mm_context_t to skip the SLB preload entirely. Also move the > >> > SLB preload code into a separate function since switch_slb() is already > >> > quite long. The default behavior (preloading SLB entries from the > >> > current thread_info struct) remains unchanged. > >> > > >> > Signed-off-by: Christopher M. Riedl > >> > > >> > --- > >> > > >> > v4: * New to series. > >> > --- > >> > arch/powerpc/include/asm/book3s/64/mmu.h | 3 ++ > >> > arch/powerpc/include/asm/mmu_context.h | 13 ++ > >> > arch/powerpc/mm/book3s64/mmu_context.c | 2 + > >> > arch/powerpc/mm/book3s64/slb.c | 56 ++-- > >> > 4 files changed, 50 insertions(+), 24 deletions(-) > >> > > >> > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h > >> > b/arch/powerpc/include/asm/book3s/64/mmu.h > >> > index eace8c3f7b0a1..b23a9dcdee5af 100644 > >> > --- a/arch/powerpc/include/asm/book3s/64/mmu.h > >> > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h > >> > @@ -130,6 +130,9 @@ typedef struct { > >> > u32 pkey_allocation_map; > >> > s16 execute_only_pkey; /* key holding execute-only protection */ > >> > #endif > >> > + > >> > +/* Do not preload SLB entries from thread_info during > >> > switch_slb() */ > >> > +bool skip_slb_preload; > >> > } mm_context_t; > >> > > >> > static inline u16 mm_ctx_user_psize(mm_context_t *ctx) > >> > diff --git a/arch/powerpc/include/asm/mmu_context.h > >> > b/arch/powerpc/include/asm/mmu_context.h > >> > index 4bc45d3ed8b0e..264787e90b1a1 100644 > >> > --- a/arch/powerpc/include/asm/mmu_context.h > >> > +++ b/arch/powerpc/include/asm/mmu_context.h > >> > @@ -298,6 +298,19 @@ static inline int arch_dup_mmap(struct mm_struct > >> > *oldmm, > >> > return 0; > >> > } > >> > > >> > +#ifdef CONFIG_PPC_BOOK3S_64 > >> > + > >> > +static inline void skip_slb_preload_mm(struct mm_struct *mm) > >> > +{ > >> > +mm->context.skip_slb_preload = true; > >> > +} > >> > + > >> > +#else > >> > + > >> > +static inline void skip_slb_preload_mm(struct mm_struct *mm) {} > >> > + > >> > +#endif /* CONFIG_PPC_BOOK3S_64 */ > >> > + > >> > #include > >> > > >> > #endif /* __KERNEL__ */ > >> > diff --git a/arch/powerpc/mm/book3s64/mmu_context.c > >> > b/arch/powerpc/mm/book3s64/mmu_context.c > >> > index c10fc8a72fb37..3479910264c59 100644 > >> > --- a/arch/powerpc/mm/book3s64/mmu_context.c > >> > +++ b/arch/powerpc/mm/book3s64/mmu_context.c > >> > @@ -202,6 +202,8 @@ int init_new_context(struct task_struct *tsk, struct > >> > mm_struct *mm) > >> > atomic_set(>context.active_cpus, 0); > >> > atomic_set(>context.copros, 0); > >> > > >> > +mm->context.skip_slb_preload = false; > >> > + > >> > return 0; > >> > } > >> > > >> > diff --git a/arch/powerpc/mm/book3s64/slb.c > >> > b/arch/powerpc/mm/book3s64/slb.c > >> > index c91bd85eb90e3..da0836cb855af 100644 > >> > --- a/arch/powerpc/mm/book3s64/slb.c > >> > +++ b/arch/powerpc/mm/book3s64/slb.c > >> > @@ -441,10 +441,39 @@ static void slb_cache_slbie_user(unsigned int > >> > index) > >> > asm volatile("slbie %0" : : "r" (slbie_data)); > >> > } > >> > > >> > +static void preload_slb_entries(struct task_struct *tsk, struct > >> > mm_struct *mm) > >> Should this be explicitly inline or even __always_inline? I'm thinking > >> switch_slb is probably a fairly hot path on hash? > > > > Yes absolutely. I'll make this change in v5. > > > >> > >> > +{ > >> > +struct thread_info *ti = task_thread_info(tsk); > >> > +unsigned char i; > >> > + > >> > +/* > >> > + * We gradually age out SLBs after a number of context switches > >> > to > >> > + * reduce reload overhead of unused entries (like we do with > >> > FP/VEC > >> > + * reload). Each time we wrap 256 switches, take an entry out > >> > of the > >> > + * SLB preload cache. > >> > + */ > >> > +tsk->thread.load_slb++; > >> > +if (!tsk->thread.load_slb) { > >> > +unsigned long pc = KSTK_EIP(tsk); > >> > + > >> > +preload_age(ti); > >> > +preload_add(ti, pc); > >> > +} > >> > + > >> > +for (i = 0; i < ti->slb_preload_nr; i++) { > >> > +unsigned char idx; > >> > +
Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
On Sun Jun 20, 2021 at 10:19 PM CDT, Daniel Axtens wrote: > Hi Chris, > > > + /* > > +* Choose a randomized, page-aligned address from the range: > > +* [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE] > > +* The lower address bound is PAGE_SIZE to avoid the zero-page. > > +* The upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to stay > > +* under DEFAULT_MAP_WINDOW with the Book3s64 Hash MMU. > > +*/ > > + patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK) > > + % (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE)); > > I checked and poking_init() comes after the functions that init the RNG, > so this should be fine. The maths - while a bit fiddly to reason about - > does check out. Thanks for double checking. > > > + > > + /* > > +* PTE allocation uses GFP_KERNEL which means we need to pre-allocate > > +* the PTE here. We cannot do the allocation during patching with IRQs > > +* disabled (ie. "atomic" context). > > +*/ > > + ptep = get_locked_pte(patching_mm, patching_addr, ); > > + BUG_ON(!ptep); > > + pte_unmap_unlock(ptep, ptl); > > +} > > > > #if IS_BUILTIN(CONFIG_LKDTM) > > unsigned long read_cpu_patching_addr(unsigned int cpu) > > { > > - return (unsigned long)(per_cpu(text_poke_area, cpu))->addr; > > + return patching_addr; > > } > > #endif > > > > -static int text_area_cpu_up(unsigned int cpu) > > +struct patch_mapping { > > + spinlock_t *ptl; /* for protecting pte table */ > > + pte_t *ptep; > > + struct temp_mm temp_mm; > > +}; > > + > > +#ifdef CONFIG_PPC_BOOK3S_64 > > + > > +static inline int hash_prefault_mapping(pgprot_t pgprot) > > { > > - struct vm_struct *area; > > + int err; > > > > - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > > - if (!area) { > > - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > > - cpu); > > - return -1; > > - } > > - this_cpu_write(text_poke_area, area); > > + if (radix_enabled()) > > + return 0; > > > > - return 0; > > -} > > + err = slb_allocate_user(patching_mm, patching_addr); > > + if (err) > > + pr_warn("map patch: failed to allocate slb entry\n"); > > > > Here if slb_allocate_user() fails, you'll print a warning and then fall > through to the rest of the function. You do return err, but there's a > later call to hash_page_mm() that also sets err. Can slb_allocate_user() > fail while hash_page_mm() succeeds, and would that be a problem? Hmm, yes I think this is a problem. If slb_allocate_user() fails then we could potentially mask that error until the actual patching fails/miscompares later (and that *will* certainly fail in this case). I will return the error and exit the function early in v5 of the series. Thanks! > > > -static int text_area_cpu_down(unsigned int cpu) > > -{ > > - free_vm_area(this_cpu_read(text_poke_area)); > > - return 0; > > + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0, > > + HPTE_USE_KERNEL_KEY); > > + if (err) > > + pr_warn("map patch: failed to insert hashed page\n"); > > + > > + /* See comment in switch_slb() in mm/book3s64/slb.c */ > > + isync(); > > + > > The comment reads: > > /* > * Synchronize slbmte preloads with possible subsequent user memory > * address accesses by the kernel (user mode won't happen until > * rfid, which is safe). > */ > isync(); > > I have to say having read the description of isync I'm not 100% sure why > that's enough (don't we also need stores to complete?) but I'm happy to > take commit 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache") > on trust here! > > I think it does make sense for you to have that barrier here: you are > potentially about to start poking at the memory mapped through that SLB > entry so you should make sure you're fully synchronised. > > > + return err; > > } > > > > > + init_temp_mm(_mapping->temp_mm, patching_mm); > > + use_temporary_mm(_mapping->temp_mm); > > > > - pmdp = pmd_offset(pudp, addr); > > - if (unlikely(!pmdp)) > > - return -EINVAL; > > + /* > > +* On Book3s64 with the Hash MMU we have to manually insert the SLB > > +* entry and HPTE to prevent taking faults on the patching_addr later. > > +*/ > > + return(hash_prefault_mapping(pgprot)); > > hmm, `return hash_prefault_mapping(pgprot);` or > `return (hash_prefault_mapping((pgprot));` maybe? Yeah, I noticed I left the extra parentheses here after the RESEND. I think this is left-over when I had another wrapper here... anyway, I'll clean it up for v5. > > Kind regards, > Daniel
[PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes
Currently scheduler doesn't check if node is online before adding CPUs to the node mask. However on some architectures, node distance is only available for nodes that are online. Its not sure how much to rely on the node distance, when one of the nodes is offline. If said node distance is fake (since one of the nodes is offline) and the actual node distance is different, then the cpumask of such nodes when the nodes become becomes online will be wrong. This can cause topology_span_sane to throw up a warning message and the rest of the topology being not updated properly. Resolve this by skipping update of cpumask for nodes that are not online. However by skipping, relevant CPUs may not be set when nodes are onlined. i.e when coming up with NUMA masks at a certain NUMA distance, CPUs that are part of other nodes, which are already online will not be part of the NUMA mask. Hence the first time, a CPU is added to the newly onlined node, add the other CPUs to the numa_mask. Cc: LKML Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch Cc: Michael Ellerman Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Gautham R Shenoy Cc: Dietmar Eggemann Cc: Mel Gorman Cc: Vincent Guittot Cc: Rik van Riel Cc: Geetika Moolchandani Cc: Laurent Dufour Reported-by: Geetika Moolchandani Signed-off-by: Srikar Dronamraju --- Changelog v1->v2: v1 link: http://lore.kernel.org/lkml/20210520154427.1041031-4-sri...@linux.vnet.ibm.com/t/#u Update the NUMA masks, whenever 1st CPU is added to cpuless node kernel/sched/topology.c | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b77ad49dc14f..f25dbcab4fd2 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1833,6 +1833,9 @@ void sched_init_numa(void) sched_domains_numa_masks[i][j] = mask; for_each_node(k) { + if (!node_online(j)) + continue; + if (sched_debug() && (node_distance(j, k) != node_distance(k, j))) sched_numa_warn("Node-distance not symmetric"); @@ -1891,12 +1894,30 @@ void sched_init_numa(void) void sched_domains_numa_masks_set(unsigned int cpu) { int node = cpu_to_node(cpu); - int i, j; + int i, j, empty; + empty = cpumask_empty(sched_domains_numa_masks[0][node]); for (i = 0; i < sched_domains_numa_levels; i++) { for (j = 0; j < nr_node_ids; j++) { - if (node_distance(j, node) <= sched_domains_numa_distance[i]) + if (!node_online(j)) + continue; + + if (node_distance(j, node) <= sched_domains_numa_distance[i]) { cpumask_set_cpu(cpu, sched_domains_numa_masks[i][j]); + + /* +* We skip updating numa_masks for offline +* nodes. However now that the node is +* finally online, CPUs that were added +* earlier, should now be accommodated into +* newly oneline node's numa mask. +*/ + if (node != j && empty) { + cpumask_or(sched_domains_numa_masks[i][node], + sched_domains_numa_masks[i][node], + sched_domains_numa_masks[0][j]); + } + } } } } -- 2.27.0
[PATCH v2 2/2] powerpc/numa: Fill distance_lookup_table for offline nodes
Currently scheduler populates the distance map by looking at distance of each node from all other nodes. This should work for most architectures and platforms. Scheduler expects unique number of node distances to be available at boot. It uses node distance to calculate this unique node distances. On Power Servers, node distances for offline nodes is not available. However, Power Servers already knows unique possible node distances. Fake the offline node's distance_lookup_table entries so that all possible node distances are updated. For example distance info from numactl from a fully populated 8 node system at boot may look like this. node distances: node 0 1 2 3 4 5 6 7 0: 10 20 40 40 40 40 40 40 1: 20 10 40 40 40 40 40 40 2: 40 40 10 20 40 40 40 40 3: 40 40 20 10 40 40 40 40 4: 40 40 40 40 10 20 40 40 5: 40 40 40 40 20 10 40 40 6: 40 40 40 40 40 40 10 20 7: 40 40 40 40 40 40 20 10 However the same system when only two nodes are online at boot, then distance info from numactl will look like node distances: node 0 1 0: 10 20 1: 20 10 It may be implementation dependent on what node_distance(0,3) where node 0 is online and node 3 is offline. In Power Servers case, it returns LOCAL_DISTANCE(10). Here at boot the scheduler would assume that the max distance between nodes is 20. However that would not be true. When Nodes are onlined and CPUs from those nodes are hotplugged, the max node distance would be 40. However this only needs to be done if the number of unique node distances that can be computed for online nodes is less than the number of possible unique node distances as represented by distance_ref_points_depth. When the node is actually onlined, distance_lookup_table will be updated with actual entries. Cc: LKML Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch Cc: Michael Ellerman Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Gautham R Shenoy Cc: Dietmar Eggemann Cc: Mel Gorman Cc: Vincent Guittot Cc: Rik van Riel Cc: Geetika Moolchandani Cc: Laurent Dufour Reported-by: Geetika Moolchandani Signed-off-by: Srikar Dronamraju --- Changelog v1->v2: Move to a Powerpc specific solution as suggested by Peter and Valentin arch/powerpc/mm/numa.c | 70 ++ 1 file changed, 70 insertions(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index f2bf98bdcea2..6d0d89127190 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -860,6 +860,75 @@ void __init dump_numa_cpu_topology(void) } } +/* + * Scheduler expects unique number of node distances to be available at + * boot. It uses node distance to calculate this unique node distances. On + * POWER, node distances for offline nodes is not available. However, POWER + * already knows unique possible node distances. Fake the offline node's + * distance_lookup_table entries so that all possible node distances are + * updated. + */ +void __init fake_update_distance_lookup_table(void) +{ + unsigned long distance_map; + int i, nr_levels, nr_depth, node; + + if (!numa_enabled) + return; + + if (!form1_affinity) + return; + + /* +* distance_ref_points_depth lists the unique numa domains +* available. However it ignore LOCAL_DISTANCE. So add +1 +* to get the actual number of unique distances. +*/ + nr_depth = distance_ref_points_depth + 1; + + WARN_ON(nr_depth > sizeof(distance_map)); + + bitmap_zero(_map, nr_depth); + bitmap_set(_map, 0, 1); + + for_each_online_node(node) { + int nd, distance = LOCAL_DISTANCE; + + if (node == first_online_node) + continue; + + nd = __node_distance(node, first_online_node); + for (i = 0; i < nr_depth; i++, distance *= 2) { + if (distance == nd) { + bitmap_set(_map, i, 1); + break; + } + } + nr_levels = bitmap_weight(_map, nr_depth); + if (nr_levels == nr_depth) + return; + } + + for_each_node(node) { + if (node_online(node)) + continue; + + i = find_first_zero_bit(_map, nr_depth); + if (i >= nr_depth || i == 0) { + pr_warn("Levels(%d) not matching levels(%d)", nr_levels, nr_depth); + return; + } + + bitmap_set(_map, i, 1); + while (i--) + distance_lookup_table[node][i] = node; + + nr_levels = bitmap_weight(_map, nr_depth); + if (nr_levels == nr_depth) + return; + } +} + /* Initialize NODE_DATA for a node on the local memory
[PATCH v2 0/2] Skip numa distance for offline nodes
Changelog v1->v2: v1: http://lore.kernel.org/lkml/20210520154427.1041031-1-sri...@linux.vnet.ibm.com/t/#u - Update the numa masks, whenever 1st CPU is added to cpuless node - Populate all possible nodes distances in boot in a powerpc specific function Geetika reported yet another trace while doing a dlpar CPU add operation. This was true even on top of a recent commit 6980d13f0dd1 ("powerpc/smp: Set numa node before updating mask") which fixed a similar trace. WARNING: CPU: 40 PID: 2954 at kernel/sched/topology.c:2088 build_sched_domains+0x6e8/0x1540 Modules linked in: nft_counter nft_compat rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink dm_multipath pseries_rng xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 40 PID: 2954 Comm: kworker/40:0 Not tainted 5.13.0-rc1+ #19 Workqueue: events cpuset_hotplug_workfn NIP: c01de588 LR: c01de584 CTR: 006cd36c REGS: c0002772b250 TRAP: 0700 Not tainted (5.12.0-rc5-master+) MSR: 80029033 CR: 28828422 XER: 000d CFAR: c020c2f8 IRQMASK: 0 #012GPR00: c01de584 c0002772b4f0 c1f55400 0036 #012GPR04: c063c6368010 c063c63f0a00 0027 c063c6368018 #012GPR08: 0023 c063c636ef48 0063c4de c063bfe9ffe8 #012GPR12: 28828424 c063fe68fe80 0417 #012GPR16: 0028 c740dcd8 c205db68 c1a3a4a0 #012GPR20: c00091ed7d20 c00091ed8520 0001 #012GPR24: c000113a9600 0190 0028 c10e3ac0 #012GPR28: c740dd00 c000317b5900 0190 NIP [c01de588] build_sched_domains+0x6e8/0x1540 LR [c01de584] build_sched_domains+0x6e4/0x1540 Call Trace: [c0002772b4f0] [c01de584] build_sched_domains+0x6e4/0x1540 (unreliable) [c0002772b640] [c01e08dc] partition_sched_domains_locked+0x3ec/0x530 [c0002772b6e0] [c02a2144] rebuild_sched_domains_locked+0x524/0xbf0 [c0002772b7e0] [c02a5620] rebuild_sched_domains+0x40/0x70 [c0002772b810] [c02a58e4] cpuset_hotplug_workfn+0x294/0xe20 [c0002772bc30] [c0187510] process_one_work+0x300/0x670 [c0002772bd10] [c01878f8] worker_thread+0x78/0x520 [c0002772bda0] [c01937f0] kthread+0x1a0/0x1b0 [c0002772be10] [c000d6ec] ret_from_kernel_thread+0x5c/0x70 Instruction dump: 7ee5bb78 7f0ac378 7f29cb78 7f68db78 7f46d378 7f84e378 f8610068 3c62ff19 fbe10060 3863e558 4802dd31 6000 <0fe0> 3920fff4 f9210080 e86100b0 Detailed analysis of the failing scenario showed that the span in question belongs to NODE domain and further the cpumasks for some cpus in NODE overlapped. There are two possible reasons how we ended up here: (1) The numa node was offline or blank with no CPUs or memory. Hence the sched_max_numa_distance could not be set correctly, or the sched_domains_numa_distance happened to be partially populated. (2) Depending on a bogus node_distance of an offline node to populate cpumasks is the issue. On POWER platform the node_distance is correctly available only for an online node which has some CPU or memory resource associated with it. For example distance info from numactl from a fully populated 8 node system at boot may look like this. node distances: node 0 1 2 3 4 5 6 7 0: 10 20 40 40 40 40 40 40 1: 20 10 40 40 40 40 40 40 2: 40 40 10 20 40 40 40 40 3: 40 40 20 10 40 40 40 40 4: 40 40 40 40 10 20 40 40 5: 40 40 40 40 20 10 40 40 6: 40 40 40 40 40 40 10 20 7: 40 40 40 40 40 40 20 10 However the same system when only two nodes are online at boot, then the numa topology will look like node distances: node 0 1 0: 10 20 1: 20 10 This series tries to fix both these problems. Note: These problems are now visible, thanks to Commit ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap") Cc: LKML Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch Cc: Michael Ellerman Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Gautham R Shenoy Cc: Dietmar Eggemann Cc: Mel Gorman Cc: Vincent Guittot Cc: Rik van Riel Cc: Geetika Moolchandani Cc: Laurent Dufour Srikar Dronamraju (2): sched/topology: Skip updating masks for non-online nodes powerpc/numa: Fill distance_lookup_table for offline nodes arch/powerpc/mm/numa.c | 70 + kernel/sched/topology.c | 25
Re: [RESEND PATCH v4 05/11] powerpc/64s: Add ability to skip SLB preload
Excerpts from Christopher M. Riedl's message of July 1, 2021 1:48 pm: > On Sun Jun 20, 2021 at 10:13 PM CDT, Daniel Axtens wrote: >> "Christopher M. Riedl" writes: >> >> > Switching to a different mm with Hash translation causes SLB entries to >> > be preloaded from the current thread_info. This reduces SLB faults, for >> > example when threads share a common mm but operate on different address >> > ranges. >> > >> > Preloading entries from the thread_info struct may not always be >> > appropriate - such as when switching to a temporary mm. Introduce a new >> > boolean in mm_context_t to skip the SLB preload entirely. Also move the >> > SLB preload code into a separate function since switch_slb() is already >> > quite long. The default behavior (preloading SLB entries from the >> > current thread_info struct) remains unchanged. >> > >> > Signed-off-by: Christopher M. Riedl >> > >> > --- >> > >> > v4: * New to series. >> > --- >> > arch/powerpc/include/asm/book3s/64/mmu.h | 3 ++ >> > arch/powerpc/include/asm/mmu_context.h | 13 ++ >> > arch/powerpc/mm/book3s64/mmu_context.c | 2 + >> > arch/powerpc/mm/book3s64/slb.c | 56 ++-- >> > 4 files changed, 50 insertions(+), 24 deletions(-) >> > >> > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h >> > b/arch/powerpc/include/asm/book3s/64/mmu.h >> > index eace8c3f7b0a1..b23a9dcdee5af 100644 >> > --- a/arch/powerpc/include/asm/book3s/64/mmu.h >> > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h >> > @@ -130,6 +130,9 @@ typedef struct { >> >u32 pkey_allocation_map; >> >s16 execute_only_pkey; /* key holding execute-only protection */ >> > #endif >> > + >> > + /* Do not preload SLB entries from thread_info during switch_slb() */ >> > + bool skip_slb_preload; >> > } mm_context_t; >> > >> > static inline u16 mm_ctx_user_psize(mm_context_t *ctx) >> > diff --git a/arch/powerpc/include/asm/mmu_context.h >> > b/arch/powerpc/include/asm/mmu_context.h >> > index 4bc45d3ed8b0e..264787e90b1a1 100644 >> > --- a/arch/powerpc/include/asm/mmu_context.h >> > +++ b/arch/powerpc/include/asm/mmu_context.h >> > @@ -298,6 +298,19 @@ static inline int arch_dup_mmap(struct mm_struct >> > *oldmm, >> >return 0; >> > } >> > >> > +#ifdef CONFIG_PPC_BOOK3S_64 >> > + >> > +static inline void skip_slb_preload_mm(struct mm_struct *mm) >> > +{ >> > + mm->context.skip_slb_preload = true; >> > +} >> > + >> > +#else >> > + >> > +static inline void skip_slb_preload_mm(struct mm_struct *mm) {} >> > + >> > +#endif /* CONFIG_PPC_BOOK3S_64 */ >> > + >> > #include >> > >> > #endif /* __KERNEL__ */ >> > diff --git a/arch/powerpc/mm/book3s64/mmu_context.c >> > b/arch/powerpc/mm/book3s64/mmu_context.c >> > index c10fc8a72fb37..3479910264c59 100644 >> > --- a/arch/powerpc/mm/book3s64/mmu_context.c >> > +++ b/arch/powerpc/mm/book3s64/mmu_context.c >> > @@ -202,6 +202,8 @@ int init_new_context(struct task_struct *tsk, struct >> > mm_struct *mm) >> >atomic_set(>context.active_cpus, 0); >> >atomic_set(>context.copros, 0); >> > >> > + mm->context.skip_slb_preload = false; >> > + >> >return 0; >> > } >> > >> > diff --git a/arch/powerpc/mm/book3s64/slb.c >> > b/arch/powerpc/mm/book3s64/slb.c >> > index c91bd85eb90e3..da0836cb855af 100644 >> > --- a/arch/powerpc/mm/book3s64/slb.c >> > +++ b/arch/powerpc/mm/book3s64/slb.c >> > @@ -441,10 +441,39 @@ static void slb_cache_slbie_user(unsigned int index) >> >asm volatile("slbie %0" : : "r" (slbie_data)); >> > } >> > >> > +static void preload_slb_entries(struct task_struct *tsk, struct mm_struct >> > *mm) >> Should this be explicitly inline or even __always_inline? I'm thinking >> switch_slb is probably a fairly hot path on hash? > > Yes absolutely. I'll make this change in v5. > >> >> > +{ >> > + struct thread_info *ti = task_thread_info(tsk); >> > + unsigned char i; >> > + >> > + /* >> > + * We gradually age out SLBs after a number of context switches to >> > + * reduce reload overhead of unused entries (like we do with FP/VEC >> > + * reload). Each time we wrap 256 switches, take an entry out of the >> > + * SLB preload cache. >> > + */ >> > + tsk->thread.load_slb++; >> > + if (!tsk->thread.load_slb) { >> > + unsigned long pc = KSTK_EIP(tsk); >> > + >> > + preload_age(ti); >> > + preload_add(ti, pc); >> > + } >> > + >> > + for (i = 0; i < ti->slb_preload_nr; i++) { >> > + unsigned char idx; >> > + unsigned long ea; >> > + >> > + idx = (ti->slb_preload_tail + i) % SLB_PRELOAD_NR; >> > + ea = (unsigned long)ti->slb_preload_esid[idx] << SID_SHIFT; >> > + >> > + slb_allocate_user(mm, ea); >> > + } >> > +} >> > + >> > /* Flush all user entries from the segment table of the current >> > processor. */ >> > void switch_slb(struct task_struct *tsk, struct mm_struct *mm) >> > { >> > - struct thread_info *ti = task_thread_info(tsk); >> >unsigned char i; >> > >> >/* >> >
Re: [RESEND PATCH v4 05/11] powerpc/64s: Add ability to skip SLB preload
On Sun Jun 20, 2021 at 10:13 PM CDT, Daniel Axtens wrote: > "Christopher M. Riedl" writes: > > > Switching to a different mm with Hash translation causes SLB entries to > > be preloaded from the current thread_info. This reduces SLB faults, for > > example when threads share a common mm but operate on different address > > ranges. > > > > Preloading entries from the thread_info struct may not always be > > appropriate - such as when switching to a temporary mm. Introduce a new > > boolean in mm_context_t to skip the SLB preload entirely. Also move the > > SLB preload code into a separate function since switch_slb() is already > > quite long. The default behavior (preloading SLB entries from the > > current thread_info struct) remains unchanged. > > > > Signed-off-by: Christopher M. Riedl > > > > --- > > > > v4: * New to series. > > --- > > arch/powerpc/include/asm/book3s/64/mmu.h | 3 ++ > > arch/powerpc/include/asm/mmu_context.h | 13 ++ > > arch/powerpc/mm/book3s64/mmu_context.c | 2 + > > arch/powerpc/mm/book3s64/slb.c | 56 ++-- > > 4 files changed, 50 insertions(+), 24 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h > > b/arch/powerpc/include/asm/book3s/64/mmu.h > > index eace8c3f7b0a1..b23a9dcdee5af 100644 > > --- a/arch/powerpc/include/asm/book3s/64/mmu.h > > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h > > @@ -130,6 +130,9 @@ typedef struct { > > u32 pkey_allocation_map; > > s16 execute_only_pkey; /* key holding execute-only protection */ > > #endif > > + > > + /* Do not preload SLB entries from thread_info during switch_slb() */ > > + bool skip_slb_preload; > > } mm_context_t; > > > > static inline u16 mm_ctx_user_psize(mm_context_t *ctx) > > diff --git a/arch/powerpc/include/asm/mmu_context.h > > b/arch/powerpc/include/asm/mmu_context.h > > index 4bc45d3ed8b0e..264787e90b1a1 100644 > > --- a/arch/powerpc/include/asm/mmu_context.h > > +++ b/arch/powerpc/include/asm/mmu_context.h > > @@ -298,6 +298,19 @@ static inline int arch_dup_mmap(struct mm_struct > > *oldmm, > > return 0; > > } > > > > +#ifdef CONFIG_PPC_BOOK3S_64 > > + > > +static inline void skip_slb_preload_mm(struct mm_struct *mm) > > +{ > > + mm->context.skip_slb_preload = true; > > +} > > + > > +#else > > + > > +static inline void skip_slb_preload_mm(struct mm_struct *mm) {} > > + > > +#endif /* CONFIG_PPC_BOOK3S_64 */ > > + > > #include > > > > #endif /* __KERNEL__ */ > > diff --git a/arch/powerpc/mm/book3s64/mmu_context.c > > b/arch/powerpc/mm/book3s64/mmu_context.c > > index c10fc8a72fb37..3479910264c59 100644 > > --- a/arch/powerpc/mm/book3s64/mmu_context.c > > +++ b/arch/powerpc/mm/book3s64/mmu_context.c > > @@ -202,6 +202,8 @@ int init_new_context(struct task_struct *tsk, struct > > mm_struct *mm) > > atomic_set(>context.active_cpus, 0); > > atomic_set(>context.copros, 0); > > > > + mm->context.skip_slb_preload = false; > > + > > return 0; > > } > > > > diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c > > index c91bd85eb90e3..da0836cb855af 100644 > > --- a/arch/powerpc/mm/book3s64/slb.c > > +++ b/arch/powerpc/mm/book3s64/slb.c > > @@ -441,10 +441,39 @@ static void slb_cache_slbie_user(unsigned int index) > > asm volatile("slbie %0" : : "r" (slbie_data)); > > } > > > > +static void preload_slb_entries(struct task_struct *tsk, struct mm_struct > > *mm) > Should this be explicitly inline or even __always_inline? I'm thinking > switch_slb is probably a fairly hot path on hash? Yes absolutely. I'll make this change in v5. > > > +{ > > + struct thread_info *ti = task_thread_info(tsk); > > + unsigned char i; > > + > > + /* > > +* We gradually age out SLBs after a number of context switches to > > +* reduce reload overhead of unused entries (like we do with FP/VEC > > +* reload). Each time we wrap 256 switches, take an entry out of the > > +* SLB preload cache. > > +*/ > > + tsk->thread.load_slb++; > > + if (!tsk->thread.load_slb) { > > + unsigned long pc = KSTK_EIP(tsk); > > + > > + preload_age(ti); > > + preload_add(ti, pc); > > + } > > + > > + for (i = 0; i < ti->slb_preload_nr; i++) { > > + unsigned char idx; > > + unsigned long ea; > > + > > + idx = (ti->slb_preload_tail + i) % SLB_PRELOAD_NR; > > + ea = (unsigned long)ti->slb_preload_esid[idx] << SID_SHIFT; > > + > > + slb_allocate_user(mm, ea); > > + } > > +} > > + > > /* Flush all user entries from the segment table of the current processor. > > */ > > void switch_slb(struct task_struct *tsk, struct mm_struct *mm) > > { > > - struct thread_info *ti = task_thread_info(tsk); > > unsigned char i; > > > > /* > > @@ -502,29 +531,8 @@ void switch_slb(struct task_struct *tsk, struct > > mm_struct *mm) > > > > copy_mm_to_paca(mm); > > > > - /* > > -* We gradually age out SLBs
[powerpc:merge] BUILD SUCCESS 086d9878e1092e7e69a69676ee9ec792690abb1d
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge branch HEAD: 086d9878e1092e7e69a69676ee9ec792690abb1d Automatic merge of 'master' into merge (2021-06-30 23:05) elapsed time: 727m configs tested: 101 configs skipped: 3 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig sh urquell_defconfig arm pcm027_defconfig arcvdk_hs38_smp_defconfig arcnsimosci_defconfig armvt8500_v6_v7_defconfig powerpc tqm8xx_defconfig powerpc mpc866_ads_defconfig powerpc tqm8548_defconfig sh se7619_defconfig ia64 allmodconfig openriscdefconfig powerpc katmai_defconfig powerpc eiger_defconfig mipse55_defconfig powerpc pseries_defconfig mips bmips_be_defconfig powerpc currituck_defconfig mips ath25_defconfig sh j2_defconfig ia64 tiger_defconfig xtensa audio_kc705_defconfig arm mv78xx0_defconfig um defconfig x86_64allnoconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a002-20210630 x86_64 randconfig-a001-20210630 x86_64 randconfig-a004-20210630 x86_64 randconfig-a005-20210630 x86_64 randconfig-a006-20210630 x86_64 randconfig-a003-20210630 i386 randconfig-a004-20210630 i386 randconfig-a001-20210630 i386 randconfig-a003-20210630 i386 randconfig-a002-20210630 i386 randconfig-a005-20210630 i386 randconfig-a006-20210630 i386 randconfig-a014-20210630 i386 randconfig-a011-20210630 i386 randconfig-a016-20210630 i386 randconfig-a012-20210630 i386 randconfig-a013-20210630 i386 randconfig-a015-20210630 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv rv32_defconfig riscvallmodconfig x86_64rhel-8.3-kselftests um x86_64_defconfig um i386_defconfig umkunit_defconfig x86_64 allyesconfig x86_64 defconfig x86_64 rhel-8.3 x86_64 rhel-8.3-kbuiltin x86_64 kexec clang tested configs: x86_64 randconfig-b001-20210630 x86_64 randconfig-a012-20210630 x86_64 randconfig-a015-20210630 x86_64 randconfig-a016-20210630 x86_64 randconfig-a013-20210630 x86_64 randconfig-a011
[powerpc:next] BUILD REGRESSION 91fc46eced0f70526d74468ac6c932c90a8585b3
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next branch HEAD: 91fc46eced0f70526d74468ac6c932c90a8585b3 powerpc/64s: move ret_from_fork etc above __end_soft_masked Error/Warning in current branch: arch/powerpc/lib/restart_table.c:14:15: error: no previous prototype for function 'search_kernel_restart_table' [-Werror,-Wmissing-prototypes] arch/powerpc/lib/restart_table.c:14:15: warning: no previous prototype for function 'search_kernel_restart_table' [-Wmissing-prototypes] arch/powerpc/lib/restart_table.c:22:6: error: no previous prototype for function 'search_kernel_soft_mask_table' [-Werror,-Wmissing-prototypes] arch/powerpc/lib/restart_table.c:22:6: warning: no previous prototype for function 'search_kernel_soft_mask_table' [-Wmissing-prototypes] possible Error/Warning in current branch: arch/powerpc/platforms/pseries/vas.c:186:13: warning: no previous prototype for 'pseries_vas_fault_thread_fn' [-Wmissing-prototypes] Error/Warning ids grouped by kconfigs: gcc_recent_errors `-- powerpc64-randconfig-r034-20210630 `-- arch-powerpc-platforms-pseries-vas.c:warning:no-previous-prototype-for-pseries_vas_fault_thread_fn clang_recent_errors |-- powerpc-buildonly-randconfig-r003-20210630 | |-- arch-powerpc-lib-restart_table.c:error:no-previous-prototype-for-function-search_kernel_restart_table-Werror-Wmissing-prototypes | `-- arch-powerpc-lib-restart_table.c:error:no-previous-prototype-for-function-search_kernel_soft_mask_table-Werror-Wmissing-prototypes `-- powerpc-randconfig-r012-20210630 |-- arch-powerpc-lib-restart_table.c:warning:no-previous-prototype-for-function-search_kernel_restart_table `-- arch-powerpc-lib-restart_table.c:warning:no-previous-prototype-for-function-search_kernel_soft_mask_table elapsed time: 726m configs tested: 137 configs skipped: 4 gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig mipsworkpad_defconfig arm at91_dt_defconfig sh se7712_defconfig ia64 bigsur_defconfig mips maltasmvp_defconfig arm aspeed_g4_defconfig sh se7724_defconfig arm s5pv210_defconfig m68km5272c3_defconfig ia64 alldefconfig m68km5307c3_defconfig sh sdk7786_defconfig shdreamcast_defconfig shedosk7705_defconfig armzeus_defconfig powerpc maple_defconfig powerpc mpc834x_mds_defconfig arm collie_defconfig sh se7721_defconfig powerpc mpc8540_ads_defconfig parisc alldefconfig mips db1xxx_defconfig sh r7785rp_defconfig mips xway_defconfig powerpc sbc8548_defconfig arc axs103_defconfig sh rsk7203_defconfig mipsqi_lb60_defconfig powerpcwarp_defconfig arm h5000_defconfig mips sb1250_swarm_defconfig s390 allyesconfig arm hackkit_defconfig powerpc bluestone_defconfig mips capcella_defconfig arm pxa255-idp_defconfig powerpc pseries_defconfig mips bmips_be_defconfig powerpc currituck_defconfig mips ath25_defconfig mips rt305x_defconfig sh se7705_defconfig powerpcfsp2_defconfig ia64 tiger_defconfig sh rts7751r2dplus_defconfig powerpc mpc836x_mds_defconfig arm pxa_defconfig sh se7206_defconfig mips mpc30x_defconfig arc nsimosci_hs_defconfig sh se7722_defconfig powerpc canyonlands_defconfig sh se7619_defconfig shsh7785lcr_defconfig powerpc mpc885_ads_defconfig powerpc tqm8xx_defconfig nios2alldefconfig mips loongson1c_defconfig sh se7343_defconfig x86_64allnoconfig ia64 allmodconfig ia64defconfig ia64
Re: [PATCH v3 3/9] powerpc/64e: remove implicit soft-masking and interrupt exit restart logic
Excerpts from Christophe Leroy's message of June 30, 2021 5:56 pm: > > > Le 30/06/2021 à 09:46, Nicholas Piggin a écrit : >> The implicit soft-masking to speed up interrupt return was going to be >> used by 64e as well, but it has not been extensively tested on that >> platform and is not considered ready. It was intended to be disabled >> before merge. Disable it for now. >> >> Most of the restart code is common with 64s, so with more correctness >> and performance testing this could be re-enabled again by adding the >> extra soft-mask checks to interrupt handlers and flipping >> exit_must_hard_disable(). >> >> Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/include/asm/interrupt.h | 33 >> arch/powerpc/kernel/exceptions-64e.S | 12 +- >> arch/powerpc/kernel/interrupt.c | 2 +- >> arch/powerpc/kernel/interrupt_64.S | 16 -- >> 4 files changed, 40 insertions(+), 23 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/interrupt.h >> b/arch/powerpc/include/asm/interrupt.h >> index 8b4b1e84e110..f13c93b033c7 100644 >> --- a/arch/powerpc/include/asm/interrupt.h >> +++ b/arch/powerpc/include/asm/interrupt.h >> @@ -73,20 +73,34 @@ >> #include >> #include >> >> -#ifdef CONFIG_PPC64 >> +#ifdef CONFIG_PPC_BOOK3S_64 > > Can we avoid that ifdef and use IS_ENABLED(CONFIG_PPC_BOOK3S_64) below ? Hey Christophe, Thanks for the review, sorry it was a bit rushed to get these fixes in before the pull. I agree with this and there's a few other cleanups we might do as well. Something to look at next. > >> extern char __end_soft_masked[]; >> unsigned long search_kernel_restart_table(unsigned long addr); >> -#endif >> >> -#ifdef CONFIG_PPC_BOOK3S_64 >> DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant); >> >> +static inline bool is_implicit_soft_masked(struct pt_regs *regs) >> +{ >> +if (regs->msr & MSR_PR) >> +return false; >> + >> +if (regs->nip >= (unsigned long)__end_soft_masked) >> +return false; >> + >> +return true; >> +} >> + >> static inline void srr_regs_clobbered(void) >> { >> local_paca->srr_valid = 0; >> local_paca->hsrr_valid = 0; >> } >> #else >> +static inline bool is_implicit_soft_masked(struct pt_regs *regs) >> +{ >> +return false; >> +} >> + >> static inline void srr_regs_clobbered(void) >> { >> } >> @@ -150,11 +164,13 @@ static inline void interrupt_enter_prepare(struct >> pt_regs *regs, struct interrup >> */ >> if (TRAP(regs) != INTERRUPT_PROGRAM) { >> CT_WARN_ON(ct_state() != CONTEXT_KERNEL); >> -BUG_ON(regs->nip < (unsigned long)__end_soft_masked); >> +BUG_ON(is_implicit_soft_masked(regs)); >> } >> +#ifdef CONFIG_PPC_BOOK3S > > Allthough we are already in a PPC64 section, wouldn't it be better to use > CONFIG_PPC_BOOK3S_64 ? > > Can we use IS_ENABLED(CONFIG_PPC_BOOK3S_64) instead ? Good question, it's a matter of preference. I have used PPC_BOOK3S in other places, but maybe in files shared by 32-bit it would be better to have the _64? On the other hand, in cases where you have an else or #else, then you still need the PPC64 context to understand that. I don't really have a preference, I would go with either. Making some convention and using it everywhere is probably a good idea though. Thanks, Nick
Re: [RFC PATCH 19/43] KVM: PPC: Book3S HV P9: Add kvmppc_stop_thread to match kvmppc_start_thread
Nicholas Piggin writes: > Small cleanup makes it a bit easier to match up entry and exit > operations. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kvm/book3s_hv.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index b8b0695a9312..86c85e303a6d 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -2948,6 +2948,13 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu, > struct kvmppc_vcore *vc) > kvmppc_ipi_thread(cpu); > } > > +/* Old path does this in asm */ > +static void kvmppc_stop_thread(struct kvm_vcpu *vcpu) > +{ > + vcpu->cpu = -1; > + vcpu->arch.thread_cpu = -1; > +} > + > static void kvmppc_wait_for_nap(int n_threads) > { > int cpu = smp_processor_id(); > @@ -4154,8 +4161,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, > u64 time_limit, > dec = (s32) dec; > tb = mftb(); > vcpu->arch.dec_expires = dec + tb; > - vcpu->cpu = -1; > - vcpu->arch.thread_cpu = -1; > > store_spr_state(vcpu); > > @@ -4627,6 +4632,8 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 > time_limit, > > guest_exit_irqoff(); > > + kvmppc_stop_thread(vcpu); > + > powerpc_local_irq_pmu_restore(flags); > > cpumask_clear_cpu(pcpu, >arch.cpu_in_guest);
Re: [RFC PATCH 07/43] KVM: PPC: Book3S HV: POWER10 enable HAIL when running radix guests
Nicholas Piggin writes: > HV interrupts may be taken with the MMU enabled when radix guests are > running. Enable LPCR[HAIL] on ISA v3.1 processors for radix guests. > Make this depend on the host LPCR[HAIL] being enabled. Currently that is > always enabled, but having this test means any issue that might require > LPCR[HAIL] to be disabled in the host will not have to be duplicated in > KVM. > > -1380 cycles on P10 NULL hcall entry+exit > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kvm/book3s_hv.c | 29 + > 1 file changed, 25 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 36e1db48fccf..ed713f49fbd5 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -4896,6 +4896,8 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu > *vcpu) > */ > int kvmppc_switch_mmu_to_hpt(struct kvm *kvm) > { > + unsigned long lpcr, lpcr_mask; > + > if (nesting_enabled(kvm)) > kvmhv_release_all_nested(kvm); > kvmppc_rmap_reset(kvm); > @@ -4905,8 +4907,13 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm) > kvm->arch.radix = 0; > spin_unlock(>mmu_lock); > kvmppc_free_radix(kvm); > - kvmppc_update_lpcr(kvm, LPCR_VPM1, > -LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR); > + > + lpcr = LPCR_VPM1; > + lpcr_mask = LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR; > + if (cpu_has_feature(CPU_FTR_ARCH_31)) > + lpcr_mask |= LPCR_HAIL; > + kvmppc_update_lpcr(kvm, lpcr, lpcr_mask); > + > return 0; > } > > @@ -4916,6 +4923,7 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm) > */ > int kvmppc_switch_mmu_to_radix(struct kvm *kvm) > { > + unsigned long lpcr, lpcr_mask; > int err; > > err = kvmppc_init_vm_radix(kvm); > @@ -4927,8 +4935,17 @@ int kvmppc_switch_mmu_to_radix(struct kvm *kvm) > kvm->arch.radix = 1; > spin_unlock(>mmu_lock); > kvmppc_free_hpt(>arch.hpt); > - kvmppc_update_lpcr(kvm, LPCR_UPRT | LPCR_GTSE | LPCR_HR, > -LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR); > + > + lpcr = LPCR_UPRT | LPCR_GTSE | LPCR_HR; > + lpcr_mask = LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR; > + if (cpu_has_feature(CPU_FTR_ARCH_31)) { > + lpcr_mask |= LPCR_HAIL; > + if (cpu_has_feature(CPU_FTR_HVMODE) && > + (kvm->arch.host_lpcr & LPCR_HAIL)) > + lpcr |= LPCR_HAIL; > + } > + kvmppc_update_lpcr(kvm, lpcr, lpcr_mask); > + > return 0; > } > > @@ -5092,6 +5109,10 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm) > kvm->arch.mmu_ready = 1; > lpcr &= ~LPCR_VPM1; > lpcr |= LPCR_UPRT | LPCR_GTSE | LPCR_HR; > + if (cpu_has_feature(CPU_FTR_HVMODE) && > + cpu_has_feature(CPU_FTR_ARCH_31) && > + (kvm->arch.host_lpcr & LPCR_HAIL)) > + lpcr |= LPCR_HAIL; > ret = kvmppc_init_vm_radix(kvm); > if (ret) { > kvmppc_free_lpid(kvm->arch.lpid);
Re: [RFC PATCH 08/43] powerpc/64s: Keep AMOR SPR a constant ~0 at runtime
Nicholas Piggin writes: > This register controls supervisor SPR modifications, and as such is only > relevant for KVM. KVM always sets AMOR to ~0 on guest entry, and never > restores it coming back out to the host, so it can be kept constant and > avoid the mtSPR in KVM guest entry. > > -21 cycles (9116) cycles POWER9 virt-mode NULL hcall > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kernel/cpu_setup_power.c| 8 > arch/powerpc/kernel/dt_cpu_ftrs.c| 2 ++ > arch/powerpc/kvm/book3s_hv_p9_entry.c| 2 -- > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 -- > arch/powerpc/mm/book3s64/radix_pgtable.c | 15 --- > arch/powerpc/platforms/powernv/idle.c| 8 +++- > 6 files changed, 13 insertions(+), 24 deletions(-) > > diff --git a/arch/powerpc/kernel/cpu_setup_power.c > b/arch/powerpc/kernel/cpu_setup_power.c > index 3cca88ee96d7..a29dc8326622 100644 > --- a/arch/powerpc/kernel/cpu_setup_power.c > +++ b/arch/powerpc/kernel/cpu_setup_power.c > @@ -137,6 +137,7 @@ void __setup_cpu_power7(unsigned long offset, struct > cpu_spec *t) > return; > > mtspr(SPRN_LPID, 0); > + mtspr(SPRN_AMOR, ~0); > mtspr(SPRN_PCR, PCR_MASK); > init_LPCR_ISA206(mfspr(SPRN_LPCR), LPCR_LPES1 >> LPCR_LPES_SH); > } > @@ -150,6 +151,7 @@ void __restore_cpu_power7(void) > return; > > mtspr(SPRN_LPID, 0); > + mtspr(SPRN_AMOR, ~0); > mtspr(SPRN_PCR, PCR_MASK); > init_LPCR_ISA206(mfspr(SPRN_LPCR), LPCR_LPES1 >> LPCR_LPES_SH); > } > @@ -164,6 +166,7 @@ void __setup_cpu_power8(unsigned long offset, struct > cpu_spec *t) > return; > > mtspr(SPRN_LPID, 0); > + mtspr(SPRN_AMOR, ~0); > mtspr(SPRN_PCR, PCR_MASK); > init_LPCR_ISA206(mfspr(SPRN_LPCR) | LPCR_PECEDH, 0); /* LPES = 0 */ > init_HFSCR(); > @@ -184,6 +187,7 @@ void __restore_cpu_power8(void) > return; > > mtspr(SPRN_LPID, 0); > + mtspr(SPRN_AMOR, ~0); > mtspr(SPRN_PCR, PCR_MASK); > init_LPCR_ISA206(mfspr(SPRN_LPCR) | LPCR_PECEDH, 0); /* LPES = 0 */ > init_HFSCR(); > @@ -202,6 +206,7 @@ void __setup_cpu_power9(unsigned long offset, struct > cpu_spec *t) > mtspr(SPRN_PSSCR, 0); > mtspr(SPRN_LPID, 0); > mtspr(SPRN_PID, 0); > + mtspr(SPRN_AMOR, ~0); > mtspr(SPRN_PCR, PCR_MASK); > init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\ >LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0); > @@ -223,6 +228,7 @@ void __restore_cpu_power9(void) > mtspr(SPRN_PSSCR, 0); > mtspr(SPRN_LPID, 0); > mtspr(SPRN_PID, 0); > + mtspr(SPRN_AMOR, ~0); > mtspr(SPRN_PCR, PCR_MASK); > init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\ >LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0); > @@ -242,6 +248,7 @@ void __setup_cpu_power10(unsigned long offset, struct > cpu_spec *t) > mtspr(SPRN_PSSCR, 0); > mtspr(SPRN_LPID, 0); > mtspr(SPRN_PID, 0); > + mtspr(SPRN_AMOR, ~0); > mtspr(SPRN_PCR, PCR_MASK); > init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\ >LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0); > @@ -264,6 +271,7 @@ void __restore_cpu_power10(void) > mtspr(SPRN_PSSCR, 0); > mtspr(SPRN_LPID, 0); > mtspr(SPRN_PID, 0); > + mtspr(SPRN_AMOR, ~0); > mtspr(SPRN_PCR, PCR_MASK); > init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\ >LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0); > diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c > b/arch/powerpc/kernel/dt_cpu_ftrs.c > index 358aee7c2d79..0a6b36b4bda8 100644 > --- a/arch/powerpc/kernel/dt_cpu_ftrs.c > +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c > @@ -80,6 +80,7 @@ static void __restore_cpu_cpufeatures(void) > mtspr(SPRN_LPCR, system_registers.lpcr); > if (hv_mode) { > mtspr(SPRN_LPID, 0); > + mtspr(SPRN_AMOR, ~0); > mtspr(SPRN_HFSCR, system_registers.hfscr); > mtspr(SPRN_PCR, system_registers.pcr); > } > @@ -216,6 +217,7 @@ static int __init feat_enable_hv(struct dt_cpu_feature *f) > } > > mtspr(SPRN_LPID, 0); > + mtspr(SPRN_AMOR, ~0); > > lpcr = mfspr(SPRN_LPCR); > lpcr &= ~LPCR_LPES0; /* HV external interrupts */ > diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c > b/arch/powerpc/kvm/book3s_hv_p9_entry.c > index c4f3e066fcb4..a3281f0c9214 100644 > --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c > +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c > @@ -286,8 +286,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 > time_limit, unsigned long lpc > mtspr(SPRN_SPRG2, vcpu->arch.shregs.sprg2); > mtspr(SPRN_SPRG3, vcpu->arch.shregs.sprg3); > > - mtspr(SPRN_AMOR, ~0UL); > - > local_paca->kvm_hstate.in_guest =
Re: [RFC PATCH 38/43] KVM: PPC: Book3S HV P9: Test dawr_enabled() before saving host DAWR SPRs
Nicholas Piggin writes: > Some of the DAWR SPR access is already predicated on dawr_enabled(), > apply this to the remainder of the accesses. > > Signed-off-by: Nicholas Piggin > --- > arch/powerpc/kvm/book3s_hv_p9_entry.c | 34 --- > 1 file changed, 20 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c > b/arch/powerpc/kvm/book3s_hv_p9_entry.c > index 7aa72efcac6c..f305d1d6445c 100644 > --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c > +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c > @@ -638,13 +638,16 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 > time_limit, unsigned long lpc > > host_hfscr = mfspr(SPRN_HFSCR); > host_ciabr = mfspr(SPRN_CIABR); > - host_dawr0 = mfspr(SPRN_DAWR0); > - host_dawrx0 = mfspr(SPRN_DAWRX0); > host_psscr = mfspr(SPRN_PSSCR); > host_pidr = mfspr(SPRN_PID); > - if (cpu_has_feature(CPU_FTR_DAWR1)) { > - host_dawr1 = mfspr(SPRN_DAWR1); > - host_dawrx1 = mfspr(SPRN_DAWRX1); > + > + if (dawr_enabled()) { > + host_dawr0 = mfspr(SPRN_DAWR0); > + host_dawrx0 = mfspr(SPRN_DAWRX0); > + if (cpu_has_feature(CPU_FTR_DAWR1)) { > + host_dawr1 = mfspr(SPRN_DAWR1); > + host_dawrx1 = mfspr(SPRN_DAWRX1); The userspace needs to enable DAWR1 via KVM_CAP_PPC_DAWR1. That cap is not even implemented in QEMU currently, so we never allow the guest to set vcpu->arch.dawr1. If we check for kvm->arch.dawr1_enabled instead of the CPU feature, we could shave some more time here. > + } > } > > local_paca->kvm_hstate.host_purr = mfspr(SPRN_PURR); > @@ -951,15 +954,18 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 > time_limit, unsigned long lpc > mtspr(SPRN_HFSCR, host_hfscr); > if (vcpu->arch.ciabr != host_ciabr) > mtspr(SPRN_CIABR, host_ciabr); > - if (vcpu->arch.dawr0 != host_dawr0) > - mtspr(SPRN_DAWR0, host_dawr0); > - if (vcpu->arch.dawrx0 != host_dawrx0) > - mtspr(SPRN_DAWRX0, host_dawrx0); > - if (cpu_has_feature(CPU_FTR_DAWR1)) { > - if (vcpu->arch.dawr1 != host_dawr1) > - mtspr(SPRN_DAWR1, host_dawr1); > - if (vcpu->arch.dawrx1 != host_dawrx1) > - mtspr(SPRN_DAWRX1, host_dawrx1); > + > + if (dawr_enabled()) { > + if (vcpu->arch.dawr0 != host_dawr0) > + mtspr(SPRN_DAWR0, host_dawr0); > + if (vcpu->arch.dawrx0 != host_dawrx0) > + mtspr(SPRN_DAWRX0, host_dawrx0); > + if (cpu_has_feature(CPU_FTR_DAWR1)) { > + if (vcpu->arch.dawr1 != host_dawr1) > + mtspr(SPRN_DAWR1, host_dawr1); > + if (vcpu->arch.dawrx1 != host_dawrx1) > + mtspr(SPRN_DAWRX1, host_dawrx1); > + } > } > > if (vc->dpdes)
Re: [RFC PATCH 01/43] powerpc/64s: Remove WORT SPR from POWER9/10
Nicholas Piggin writes: > This register is not architected and not implemented in POWER9 or 10, > it just reads back zeroes for compatibility. > > -78 cycles (9255) cycles POWER9 virt-mode NULL hcall > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kvm/book3s_hv.c | 3 --- > arch/powerpc/platforms/powernv/idle.c | 2 -- > 2 files changed, 5 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 9228042bd54f..97f3d6d54b61 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -3640,7 +3640,6 @@ static void load_spr_state(struct kvm_vcpu *vcpu) > mtspr(SPRN_EBBHR, vcpu->arch.ebbhr); > mtspr(SPRN_EBBRR, vcpu->arch.ebbrr); > mtspr(SPRN_BESCR, vcpu->arch.bescr); > - mtspr(SPRN_WORT, vcpu->arch.wort); > mtspr(SPRN_TIDR, vcpu->arch.tid); > mtspr(SPRN_AMR, vcpu->arch.amr); > mtspr(SPRN_UAMOR, vcpu->arch.uamor); > @@ -3667,7 +3666,6 @@ static void store_spr_state(struct kvm_vcpu *vcpu) > vcpu->arch.ebbhr = mfspr(SPRN_EBBHR); > vcpu->arch.ebbrr = mfspr(SPRN_EBBRR); > vcpu->arch.bescr = mfspr(SPRN_BESCR); > - vcpu->arch.wort = mfspr(SPRN_WORT); > vcpu->arch.tid = mfspr(SPRN_TIDR); > vcpu->arch.amr = mfspr(SPRN_AMR); > vcpu->arch.uamor = mfspr(SPRN_UAMOR); > @@ -3699,7 +3697,6 @@ static void restore_p9_host_os_sprs(struct kvm_vcpu > *vcpu, > struct p9_host_os_sprs *host_os_sprs) > { > mtspr(SPRN_PSPB, 0); > - mtspr(SPRN_WORT, 0); > mtspr(SPRN_UAMOR, 0); > > mtspr(SPRN_DSCR, host_os_sprs->dscr); > diff --git a/arch/powerpc/platforms/powernv/idle.c > b/arch/powerpc/platforms/powernv/idle.c > index 528a7e0cf83a..180baecad914 100644 > --- a/arch/powerpc/platforms/powernv/idle.c > +++ b/arch/powerpc/platforms/powernv/idle.c > @@ -667,7 +667,6 @@ static unsigned long power9_idle_stop(unsigned long psscr) > sprs.purr = mfspr(SPRN_PURR); > sprs.spurr = mfspr(SPRN_SPURR); > sprs.dscr = mfspr(SPRN_DSCR); > - sprs.wort = mfspr(SPRN_WORT); > sprs.ciabr = mfspr(SPRN_CIABR); > > sprs.mmcra = mfspr(SPRN_MMCRA); > @@ -785,7 +784,6 @@ static unsigned long power9_idle_stop(unsigned long psscr) > mtspr(SPRN_PURR,sprs.purr); > mtspr(SPRN_SPURR, sprs.spurr); > mtspr(SPRN_DSCR,sprs.dscr); > - mtspr(SPRN_WORT,sprs.wort); > mtspr(SPRN_CIABR, sprs.ciabr); > > mtspr(SPRN_MMCRA, sprs.mmcra);
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
Hi Will and Claire, On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote: > On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote: > > On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor wrote: > > > > > > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote: > > > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > > > > use it to determine whether to bounce the data or not. This will be > > > > useful later to allow for different pools. > > > > > > > > Signed-off-by: Claire Chang > > > > Reviewed-by: Christoph Hellwig > > > > Tested-by: Stefano Stabellini > > > > Tested-by: Will Deacon > > > > Acked-by: Stefano Stabellini > > > > > > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce > > > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to > > > get to an X session consistently (although not every single time), > > > presumably due to a crash in the AMDGPU driver that I see in dmesg. > > > > > > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy > > > to provide any further information, debug, or test patches as necessary. > > > > Are you using swiotlb=force? or the swiotlb_map is called because of > > !dma_capable? > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93) > > The command line is in the dmesg: > > | Kernel command line: initrd=\amd-ucode.img > initrd=\initramfs-linux-next-llvm.img > root=PARTUUID=8680aa0c-cf09-4a69-8cf3-970478040ee7 rw intel_pstate=no_hwp > irqpoll > > but I worry that this looks _very_ similar to the issue reported by Qian > Cai which we thought we had fixed. Nathan -- is the failure deterministic? Yes, for the most part. It does not happen every single boot so when I was bisecting, I did a series of seven boots and only considered the revision good when all seven of them made it to LightDM's greeter. My results that I notated show most bad revisions failed anywhere from four to six times. > > `BUG: unable to handle page fault for address: 003a8290` and > > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory > > (maybe dev->dma_io_tlb_mem) was corrupted? > > The dev->dma_io_tlb_mem should be set here > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528) > > through device_initialize. > > I'm less sure about this. 'dma_io_tlb_mem' should be pointing at > 'io_tlb_default_mem', which is a page-aligned allocation from memblock. > The spinlock is at offset 0x24 in that structure, and looking at the > register dump from the crash: > > Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:adb4013db9e8 EFLAGS: 00010006 > Jun 29 18:28:42 hp-4300G kernel: RAX: 003a8290 RBX: > RCX: 8900572ad580 > Jun 29 18:28:42 hp-4300G kernel: RDX: 89005653f024 RSI: 000c > RDI: 1d17 > Jun 29 18:28:42 hp-4300G kernel: RBP: 0a20d000 R08: 000c > R09: > Jun 29 18:28:42 hp-4300G kernel: R10: 0a20d000 R11: 89005653f000 > R12: 0212 > Jun 29 18:28:42 hp-4300G kernel: R13: 1000 R14: 0002 > R15: 0020 > Jun 29 18:28:42 hp-4300G kernel: FS: 7f1f8898ea40() > GS:89005728() knlGS: > Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: ES: CR0: > 80050033 > Jun 29 18:28:42 hp-4300G kernel: CR2: 003a8290 CR3: 0001020d > CR4: 00350ee0 > Jun 29 18:28:42 hp-4300G kernel: Call Trace: > Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50 > Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0 > > Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and > RDX pointing at the spinlock. Yet RAX is holding junk :/ > > I agree that enabling KASAN would be a good idea, but I also think we > probably need to get some more information out of swiotlb_tbl_map_single() > to see see what exactly is going wrong in there. I can certainly enable KASAN and if there is any debug print I can add or dump anything, let me know! Cheers, Nathan
Re: [PATCH] powerpc/32s: Fix setup_{kuap/kuep}() on SMP
On Mon, 28 Jun 2021 06:56:11 + (UTC), Christophe Leroy wrote: > On SMP, setup_kup() is also called from start_secondary(). > > start_secondary() is not an __init function. > > Remove the __init marker from setup_kuep() and and setup_kuap(). Applied to powerpc/next. [1/1] powerpc/32s: Fix setup_{kuap/kuep}() on SMP https://git.kernel.org/powerpc/c/c89e632658e793fbbdcbfbe80a6c13bbf7203e9b cheers
Re: [PATCH v3 0/9] powerpc: fast interrupt exit bug and misc fixes
On Wed, 30 Jun 2021 17:46:12 +1000, Nicholas Piggin wrote: > This is a bunch of fixes for powerpc next, mostly a nasty hole in fast > interrupt exit code found by Sachin and some other bits along the way > while looking at it. > > Since v2: > - Fixed 64e patch 3 to really set exit_must_hard_disable. > - Reworded some changelogs. > > [...] Applied to powerpc/next. [1/9] powerpc/64s: fix hash page fault interrupt handler https://git.kernel.org/powerpc/c/5567b1ee29b7a83e8c01d99d34b5bbd306ce0bcf [2/9] powerpc/64e: fix CONFIG_RELOCATABLE build warnings https://git.kernel.org/powerpc/c/fce01acf830a697110ed72ecace4b0afdbcd53cb [3/9] powerpc/64e: remove implicit soft-masking and interrupt exit restart logic https://git.kernel.org/powerpc/c/9b69d48c7516a29cdaacd18d8bf5f575014a42a1 [4/9] powerpc/64s: add a table of implicit soft-masked addresses https://git.kernel.org/powerpc/c/325678fd052259e7c05ef29060a73c705ea90432 [5/9] powerpc/64s/interrupt: preserve regs->softe for NMI interrupts https://git.kernel.org/powerpc/c/1b0482229c302a3c6afd00d6b3bf0169cf279b44 [6/9] powerpc/64: enable MSR[EE] in irq replay pt_regs https://git.kernel.org/powerpc/c/2b43dd7653cca47d297756980846ebbfe8887fa1 [7/9] powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols https://git.kernel.org/powerpc/c/98798f33c6be5a511ab61958b40835b3ef08def2 [8/9] powerpc/64s/interrupt: clean up interrupt return labels https://git.kernel.org/powerpc/c/c59458b00aec4ba580d9628d36d6c984af94d192 [9/9] powerpc/64s: move ret_from_fork etc above __end_soft_masked https://git.kernel.org/powerpc/c/91fc46eced0f70526d74468ac6c932c90a8585b3 cheers
Re: [PATCH v2] powerpc/4xx: Fix setup_kuep() on SMP
On Tue, 29 Jun 2021 12:24:21 + (UTC), Christophe Leroy wrote: > On SMP, setup_kuep() is also called from start_secondary() since > commit 86f46f343272 ("powerpc/32s: Initialise KUAP and KUEP in C"). > > start_secondary() is not an __init function. > > Remove the __init marker from setup_kuep() and bail out when > not caller on the first CPU as the work is already done. Applied to powerpc/next. [1/1] powerpc/4xx: Fix setup_kuep() on SMP https://git.kernel.org/powerpc/c/fc4999864bca323f1b844fefe1b402632443c076 cheers
Re: [PATCH] crypto: DRBG - select SHA512
> On 30-Jun-2021, at 4:02 PM, Stephan Mueller wrote: > > With the swtich to use HMAC(SHA-512) as the default DRBG type, the > configuration must now also select SHA-512. > > Fixes: 9b7b94683a9b "crypto: DRBG - switch to HMAC SHA512 DRBG as default > DRBG" > Reported-by: Sachin Sant > Signed-off-by: Stephan Mueller > --- Thanks Stephan. This patch fixes the reported problem. Tested-by: Sachin Sant -Sachin
[PATCH] powerpc/papr_scm: Move duplicate definitions to common header files
The papr_scm uses PDSM structures like nd_papr_pdsm_health as the PDSM payload which would be reused by ndtest. Since ndtest is arch independent, move the PDSM header from arch/powerpc/include/uapi/ to the generic include/uapi/linux directory. Also, there are some #defines common between papr_scm and ndtest which are not exported to the user space. So, move them to a header file which can be shared across ndtest and papr_scm via newly introduced include/linux/papr_scm.h. Signed-off-by: Shivaprasad G Bhat Suggested-by: "Aneesh Kumar K.V" --- MAINTAINERS |2 arch/powerpc/include/uapi/asm/papr_pdsm.h | 147 - arch/powerpc/platforms/pseries/papr_scm.c | 43 include/linux/papr_scm.h | 48 + include/uapi/linux/papr_pdsm.h| 147 + tools/testing/nvdimm/test/ndtest.c|1 tools/testing/nvdimm/test/ndtest.h| 31 -- 7 files changed, 200 insertions(+), 219 deletions(-) delete mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h create mode 100644 include/linux/papr_scm.h create mode 100644 include/uapi/linux/papr_pdsm.h diff --git a/MAINTAINERS b/MAINTAINERS index 008fcad7ac008..5d2d8a6b5eb53 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10494,6 +10494,8 @@ F: drivers/rtc/rtc-opal.c F: drivers/scsi/ibmvscsi/ F: drivers/tty/hvc/hvc_opal.c F: drivers/watchdog/wdrtas.c +F: include/linux/papr_scm.h +F: include/uapi/linux/papr_pdsm.h F: tools/testing/selftests/powerpc N: /pmac N: powermac diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h deleted file mode 100644 index 82488b1e7276e..0 --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h +++ /dev/null @@ -1,147 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -/* - * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl - * - * (C) Copyright IBM 2020 - * - * Author: Vaibhav Jain - */ - -#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_ -#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_ - -#include -#include - -/* - * PDSM Envelope: - * - * The ioctl ND_CMD_CALL exchange data between user-space and kernel via - * envelope which consists of 2 headers sections and payload sections as - * illustrated below: - * +-+---+---+ - * | 64-Bytes | 8-Bytes | Max 184-Bytes | - * +-+---+---+ - * | ND-HEADER | PDSM-HEADER | PDSM-PAYLOAD | - * +-+---+---+ - * | nd_family | | | - * | nd_size_out | cmd_status| | - * | nd_size_in | reserved | nd_pdsm_payload | - * | nd_command | payload --> | | - * | nd_fw_size | | | - * | nd_payload ---> | | | - * +---+-+---+ - * - * ND Header: - * This is the generic libnvdimm header described as 'struct nd_cmd_pkg' - * which is interpreted by libnvdimm before passed on to papr_scm. Important - * member fields used are: - * 'nd_family' : (In) NVDIMM_FAMILY_PAPR_SCM - * 'nd_size_in': (In) PDSM-HEADER + PDSM-IN-PAYLOAD (usually 0) - * 'nd_size_out': (In) PDSM-HEADER + PDSM-RETURN-PAYLOAD - * 'nd_command' : (In) One of PAPR_PDSM_XXX - * 'nd_fw_size' : (Out) PDSM-HEADER + size of actual payload returned - * - * PDSM Header: - * This is papr-scm specific header that precedes the payload. This is defined - * as nd_cmd_pdsm_pkg. Following fields aare available in this header: - * - * 'cmd_status': (Out) Errors if any encountered while servicing PDSM. - * 'reserved' : Not used, reserved for future and should be set to 0. - * 'payload': A union of all the possible payload structs - * - * PDSM Payload: - * - * The layout of the PDSM Payload is defined by various structs shared between - * papr_scm and libndctl so that contents of payload can be interpreted. As such - * its defined as a union of all possible payload structs as - * 'union nd_pdsm_payload'. Based on the value of 'nd_cmd_pkg.nd_command' - * appropriate member of the union is accessed. - */ - -/* Max payload size that we can handle */ -#define ND_PDSM_PAYLOAD_MAX_SIZE 184 - -/* Max payload size that we can handle */ -#define ND_PDSM_HDR_SIZE \ - (sizeof(struct nd_pkg_pdsm) - ND_PDSM_PAYLOAD_MAX_SIZE) - -/* Various nvdimm health indicators */ -#define PAPR_PDSM_DIMM_HEALTHY 0 -#define PAPR_PDSM_DIMM_UNHEALTHY 1 -#define PAPR_PDSM_DIMM_CRITICAL 2 -#define PAPR_PDSM_DIMM_FATAL 3 - -/* struct nd_papr_pdsm_health.extension_flags field
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote: > On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor wrote: > > > > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote: > > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > > > use it to determine whether to bounce the data or not. This will be > > > useful later to allow for different pools. > > > > > > Signed-off-by: Claire Chang > > > Reviewed-by: Christoph Hellwig > > > Tested-by: Stefano Stabellini > > > Tested-by: Will Deacon > > > Acked-by: Stefano Stabellini > > > > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce > > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to > > get to an X session consistently (although not every single time), > > presumably due to a crash in the AMDGPU driver that I see in dmesg. > > > > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy > > to provide any further information, debug, or test patches as necessary. > > Are you using swiotlb=force? or the swiotlb_map is called because of > !dma_capable? > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93) The command line is in the dmesg: | Kernel command line: initrd=\amd-ucode.img initrd=\initramfs-linux-next-llvm.img root=PARTUUID=8680aa0c-cf09-4a69-8cf3-970478040ee7 rw intel_pstate=no_hwp irqpoll but I worry that this looks _very_ similar to the issue reported by Qian Cai which we thought we had fixed. Nathan -- is the failure deterministic? > `BUG: unable to handle page fault for address: 003a8290` and > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory > (maybe dev->dma_io_tlb_mem) was corrupted? > The dev->dma_io_tlb_mem should be set here > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528) > through device_initialize. I'm less sure about this. 'dma_io_tlb_mem' should be pointing at 'io_tlb_default_mem', which is a page-aligned allocation from memblock. The spinlock is at offset 0x24 in that structure, and looking at the register dump from the crash: Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:adb4013db9e8 EFLAGS: 00010006 Jun 29 18:28:42 hp-4300G kernel: RAX: 003a8290 RBX: RCX: 8900572ad580 Jun 29 18:28:42 hp-4300G kernel: RDX: 89005653f024 RSI: 000c RDI: 1d17 Jun 29 18:28:42 hp-4300G kernel: RBP: 0a20d000 R08: 000c R09: Jun 29 18:28:42 hp-4300G kernel: R10: 0a20d000 R11: 89005653f000 R12: 0212 Jun 29 18:28:42 hp-4300G kernel: R13: 1000 R14: 0002 R15: 0020 Jun 29 18:28:42 hp-4300G kernel: FS: 7f1f8898ea40() GS:89005728() knlGS: Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: ES: CR0: 80050033 Jun 29 18:28:42 hp-4300G kernel: CR2: 003a8290 CR3: 0001020d CR4: 00350ee0 Jun 29 18:28:42 hp-4300G kernel: Call Trace: Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50 Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0 Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and RDX pointing at the spinlock. Yet RAX is holding junk :/ I agree that enabling KASAN would be a good idea, but I also think we probably need to get some more information out of swiotlb_tbl_map_single() to see see what exactly is going wrong in there. Will
[PATCH] crypto: DRBG - select SHA512
With the swtich to use HMAC(SHA-512) as the default DRBG type, the configuration must now also select SHA-512. Fixes: 9b7b94683a9b "crypto: DRBG - switch to HMAC SHA512 DRBG as default DRBG" Reported-by: Sachin Sant Signed-off-by: Stephan Mueller --- crypto/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index ca3b02dcbbfa..64b772c5d1c9 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1768,7 +1768,7 @@ config CRYPTO_DRBG_HMAC bool default y select CRYPTO_HMAC - select CRYPTO_SHA256 + select CRYPTO_SHA512 config CRYPTO_DRBG_HASH bool "Enable Hash DRBG" -- 2.31.1
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor wrote: > > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote: > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > > use it to determine whether to bounce the data or not. This will be > > useful later to allow for different pools. > > > > Signed-off-by: Claire Chang > > Reviewed-by: Christoph Hellwig > > Tested-by: Stefano Stabellini > > Tested-by: Will Deacon > > Acked-by: Stefano Stabellini > > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to > get to an X session consistently (although not every single time), > presumably due to a crash in the AMDGPU driver that I see in dmesg. > > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy > to provide any further information, debug, or test patches as necessary. Are you using swiotlb=force? or the swiotlb_map is called because of !dma_capable? (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93) `BUG: unable to handle page fault for address: 003a8290` and the fact it crashed at `_raw_spin_lock_irqsave` look like the memory (maybe dev->dma_io_tlb_mem) was corrupted? The dev->dma_io_tlb_mem should be set here (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528) through device_initialize. I can't tell what happened from the logs, but maybe we could try KASAN to see if it provides more clue. Thanks, Claire > > Cheers, > Nathan
Re: [PATCH v3 3/9] powerpc/64e: remove implicit soft-masking and interrupt exit restart logic
Le 30/06/2021 à 09:46, Nicholas Piggin a écrit : The implicit soft-masking to speed up interrupt return was going to be used by 64e as well, but it has not been extensively tested on that platform and is not considered ready. It was intended to be disabled before merge. Disable it for now. Most of the restart code is common with 64s, so with more correctness and performance testing this could be re-enabled again by adding the extra soft-mask checks to interrupt handlers and flipping exit_must_hard_disable(). Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/interrupt.h | 33 arch/powerpc/kernel/exceptions-64e.S | 12 +- arch/powerpc/kernel/interrupt.c | 2 +- arch/powerpc/kernel/interrupt_64.S | 16 -- 4 files changed, 40 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index 8b4b1e84e110..f13c93b033c7 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -73,20 +73,34 @@ #include #include -#ifdef CONFIG_PPC64 +#ifdef CONFIG_PPC_BOOK3S_64 Can we avoid that ifdef and use IS_ENABLED(CONFIG_PPC_BOOK3S_64) below ? extern char __end_soft_masked[]; unsigned long search_kernel_restart_table(unsigned long addr); -#endif -#ifdef CONFIG_PPC_BOOK3S_64 DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant); +static inline bool is_implicit_soft_masked(struct pt_regs *regs) +{ + if (regs->msr & MSR_PR) + return false; + + if (regs->nip >= (unsigned long)__end_soft_masked) + return false; + + return true; +} + static inline void srr_regs_clobbered(void) { local_paca->srr_valid = 0; local_paca->hsrr_valid = 0; } #else +static inline bool is_implicit_soft_masked(struct pt_regs *regs) +{ + return false; +} + static inline void srr_regs_clobbered(void) { } @@ -150,11 +164,13 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup */ if (TRAP(regs) != INTERRUPT_PROGRAM) { CT_WARN_ON(ct_state() != CONTEXT_KERNEL); - BUG_ON(regs->nip < (unsigned long)__end_soft_masked); + BUG_ON(is_implicit_soft_masked(regs)); } +#ifdef CONFIG_PPC_BOOK3S Allthough we are already in a PPC64 section, wouldn't it be better to use CONFIG_PPC_BOOK3S_64 ? Can we use IS_ENABLED(CONFIG_PPC_BOOK3S_64) instead ? /* Move this under a debugging check */ if (arch_irq_disabled_regs(regs)) BUG_ON(search_kernel_restart_table(regs->nip)); +#endif } #endif @@ -244,10 +260,9 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte local_paca->irq_soft_mask = IRQS_ALL_DISABLED; local_paca->irq_happened |= PACA_IRQ_HARD_DIS; - if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !(regs->msr & MSR_PR) && - regs->nip < (unsigned long)__end_soft_masked) { - // Kernel code running below __end_soft_masked is - // implicitly soft-masked. + if (is_implicit_soft_masked(regs)) { + // Adjust regs->softe soft implicit soft-mask, so + // arch_irq_disabled_regs(regs) behaves as expected. regs->softe = IRQS_ALL_DISABLED; } @@ -282,6 +297,7 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter */ #ifdef CONFIG_PPC64 +#ifdef CONFIG_PPC_BOOK3S IS_ENABLED(CONFIG_PPC_BOOK3S_64) instead ? if (arch_irq_disabled_regs(regs)) { unsigned long rst = search_kernel_restart_table(regs->nip); if (rst) @@ -289,7 +305,6 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter } #endif -#ifdef CONFIG_PPC64 if (nmi_disables_ftrace(regs)) this_cpu_set_ftrace_enabled(state->ftrace_enabled); diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index d634bfceed2c..1401787b0b93 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -342,17 +342,7 @@ ret_from_mc_except: #define PROLOG_ADDITION_MASKABLE_GEN(n) \ lbz r10,PACAIRQSOFTMASK(r13); /* are irqs soft-masked? */ \ andi. r10,r10,IRQS_DISABLED; /* yes -> go out of line */ \ - bne masked_interrupt_book3e_##n;\ - /* Kernel code below __end_soft_masked is implicitly masked */ \ - andi. r10,r11,MSR_PR; \ - bne 1f; /* user -> not masked */ \ - std
[PATCH v3 9/9] powerpc/64s: move ret_from_fork etc above __end_soft_masked
Code which runs with interrupts enabled should be moved above __end_soft_masked where possible, because maskable interrupts that hit below that symbol will need to consult the soft mask table, which is an extra cost. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/interrupt_64.S | 52 +++--- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S index c4336e2e2ce8..4063e8a3f704 100644 --- a/arch/powerpc/kernel/interrupt_64.S +++ b/arch/powerpc/kernel/interrupt_64.S @@ -449,32 +449,6 @@ _ASM_NOKPROBE_SYMBOL(tabort_syscall) b . /* prevent speculative execution */ #endif -#ifdef CONFIG_PPC_BOOK3S -_GLOBAL(ret_from_fork_scv) - bl schedule_tail - REST_NVGPRS(r1) - li r3,0/* fork() return value */ - b .Lsyscall_vectored_common_exit -#endif - -_GLOBAL(ret_from_fork) - bl schedule_tail - REST_NVGPRS(r1) - li r3,0/* fork() return value */ - b .Lsyscall_exit - -_GLOBAL(ret_from_kernel_thread) - bl schedule_tail - REST_NVGPRS(r1) - mtctr r14 - mr r3,r15 -#ifdef PPC64_ELF_ABI_v2 - mr r12,r14 -#endif - bctrl - li r3,0 - b .Lsyscall_exit - /* * If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not * touched, no exit work created, then this can be used. @@ -768,3 +742,29 @@ interrupt_return_macro hsrr __end_soft_masked: DEFINE_FIXED_SYMBOL(__end_soft_masked) #endif /* CONFIG_PPC_BOOK3S */ + +#ifdef CONFIG_PPC_BOOK3S +_GLOBAL(ret_from_fork_scv) + bl schedule_tail + REST_NVGPRS(r1) + li r3,0/* fork() return value */ + b .Lsyscall_vectored_common_exit +#endif + +_GLOBAL(ret_from_fork) + bl schedule_tail + REST_NVGPRS(r1) + li r3,0/* fork() return value */ + b .Lsyscall_exit + +_GLOBAL(ret_from_kernel_thread) + bl schedule_tail + REST_NVGPRS(r1) + mtctr r14 + mr r3,r15 +#ifdef PPC64_ELF_ABI_v2 + mr r12,r14 +#endif + bctrl + li r3,0 + b .Lsyscall_exit -- 2.23.0
[PATCH v3 8/9] powerpc/64s/interrupt: clean up interrupt return labels
Normal kernel-interrupt exits can get interrupt_return_srr_user_restart in their backtrace, which is an unusual and notable function, and it is part of the user-interrupt exit path, which is doubly confusing. Add non-local labels for both user and kernel interrupt exit cases to address this and make the user and kernel cases more symmetric. Also get rid of an unused label. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/interrupt_64.S | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S index 5c18362693fe..c4336e2e2ce8 100644 --- a/arch/powerpc/kernel/interrupt_64.S +++ b/arch/powerpc/kernel/interrupt_64.S @@ -509,7 +509,9 @@ interrupt_return_\srr\(): _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()) ld r4,_MSR(r1) andi. r0,r4,MSR_PR - beq .Lkernel_interrupt_return_\srr + beq interrupt_return_\srr\()_kernel +interrupt_return_\srr\()_user: /* make backtraces match the _kernel variant */ +_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_user) addir3,r1,STACK_FRAME_OVERHEAD bl interrupt_exit_user_prepare cmpdi r3,0 @@ -623,8 +625,8 @@ RESTART_TABLE(.Linterrupt_return_\srr\()_user_rst_start, .Linterrupt_return_\srr #endif .balign IFETCH_ALIGN_BYTES -.Lkernel_interrupt_return_\srr\(): -.Linterrupt_return_\srr\()_kernel: +interrupt_return_\srr\()_kernel: +_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel) addir3,r1,STACK_FRAME_OVERHEAD bl interrupt_exit_kernel_prepare -- 2.23.0
[PATCH v3 7/9] powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols
If one interrupt exit symbol must not be kprobed, none of them can be, without more justification for why it's safe. Disallow kprobing on any of the (non-local) labels in the exit paths. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/interrupt_64.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S index 2c92bbca02ca..5c18362693fe 100644 --- a/arch/powerpc/kernel/interrupt_64.S +++ b/arch/powerpc/kernel/interrupt_64.S @@ -197,6 +197,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) .Lsyscall_vectored_\name\()_rst_end: syscall_vectored_\name\()_restart: +_ASM_NOKPROBE_SYMBOL(syscall_vectored_\name\()_restart) GET_PACA(r13) ld r1,PACA_EXIT_SAVE_R1(r13) ld r2,PACATOC(r13) @@ -238,6 +239,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate) .balign IFETCH_ALIGN_BYTES .globl system_call_common_real system_call_common_real: +_ASM_NOKPROBE_SYMBOL(system_call_common_real) ld r10,PACAKMSR(r13) /* get MSR value for kernel */ mtmsrd r10 @@ -402,6 +404,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) #ifdef CONFIG_PPC_BOOK3S syscall_restart: +_ASM_NOKPROBE_SYMBOL(syscall_restart) GET_PACA(r13) ld r1,PACA_EXIT_SAVE_R1(r13) ld r2,PACATOC(r13) @@ -420,6 +423,7 @@ RESTART_TABLE(.Lsyscall_rst_start, .Lsyscall_rst_end, syscall_restart) #ifdef CONFIG_PPC_TRANSACTIONAL_MEM tabort_syscall: +_ASM_NOKPROBE_SYMBOL(tabort_syscall) /* Firstly we need to enable TM in the kernel */ mfmsr r10 li r9, 1 @@ -602,6 +606,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS) #ifdef CONFIG_PPC_BOOK3S interrupt_return_\srr\()_user_restart: +_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_user_restart) GET_PACA(r13) ld r1,PACA_EXIT_SAVE_R1(r13) ld r2,PACATOC(r13) @@ -735,6 +740,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS) #ifdef CONFIG_PPC_BOOK3S interrupt_return_\srr\()_kernel_restart: +_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel_restart) GET_PACA(r13) ld r1,PACA_EXIT_SAVE_R1(r13) ld r2,PACATOC(r13) -- 2.23.0
[PATCH v3 6/9] powerpc/64: enable MSR[EE] in irq replay pt_regs
Similar to commit 2b48e96be2f9f ("powerpc/64: fix irq replay pt_regs->softe value"), enable MSR_EE in pt_regs->msr. This makes the regs look more normal. It also allows some extra debug checks to be added to interrupt handler entry. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/interrupt.h | 4 arch/powerpc/kernel/irq.c| 1 + 2 files changed, 5 insertions(+) diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index 789311d1e283..d4bdf7d274ac 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -173,6 +173,8 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup BUG_ON(search_kernel_restart_table(regs->nip)); #endif } + if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) + BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE)); #endif booke_restore_dbcr0(); @@ -268,6 +270,8 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte // arch_irq_disabled_regs(regs) behaves as expected. regs->softe = IRQS_ALL_DISABLED; } + if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) + BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE)); /* Don't do any per-CPU operations until interrupt state is fixed */ diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 8428caf3194e..91e63eac4e8f 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -121,6 +121,7 @@ void replay_soft_interrupts(void) ppc_save_regs(); regs.softe = IRQS_ENABLED; + regs.msr |= MSR_EE; again: if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) -- 2.23.0
[PATCH v3 5/9] powerpc/64s/interrupt: preserve regs->softe for NMI interrupts
If an NMI interrupt hits in an implicit soft-masked region, regs->softe is modified to reflect that. This may not be necessary for correctness at the moment, but it is less surprising and it's unhelpful when debugging or adding checks. Make sure this is changed back to how it was found before returning. Fixes: 4ec5feec1ad0 ("powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled") Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/interrupt.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index d7df247a149c..789311d1e283 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -227,6 +227,7 @@ struct interrupt_nmi_state { u8 irq_soft_mask; u8 irq_happened; u8 ftrace_enabled; + u64 softe; #endif }; @@ -252,6 +253,7 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte #ifdef CONFIG_PPC64 state->irq_soft_mask = local_paca->irq_soft_mask; state->irq_happened = local_paca->irq_happened; + state->softe = regs->softe; /* * Set IRQS_ALL_DISABLED unconditionally so irqs_disabled() does @@ -311,6 +313,7 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter /* Check we didn't change the pending interrupt mask. */ WARN_ON_ONCE((state->irq_happened | PACA_IRQ_HARD_DIS) != local_paca->irq_happened); + regs->softe = state->softe; local_paca->irq_happened = state->irq_happened; local_paca->irq_soft_mask = state->irq_soft_mask; #endif -- 2.23.0
[PATCH v3 4/9] powerpc/64s: add a table of implicit soft-masked addresses
Commit 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") ends up catching too much code, including ret_from_fork, and parts of interrupt and syscall return that do not expect to be interrupts to be soft-masked. If an interrupt gets marked pending, and then the code proceeds out of the implicit soft-masked region it will fail to deal with the pending interrupt. Fix this by adding a new table of addresses which explicitly marks the regions of code that are soft masked. This table is only checked for interrupts that below __end_soft_masked, so most kernel interrupts will not have the overhead of the table search. Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") Reported-by: Sachin Sant Tested-by: Sachin Sant Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/interrupt.h | 3 +- arch/powerpc/include/asm/ppc_asm.h | 7 +++ arch/powerpc/kernel/exceptions-64s.S | 64 +++- arch/powerpc/kernel/interrupt_64.S | 8 arch/powerpc/kernel/vmlinux.lds.S| 9 arch/powerpc/lib/restart_table.c | 26 +++ 6 files changed, 106 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index f13c93b033c7..d7df247a149c 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -75,6 +75,7 @@ #ifdef CONFIG_PPC_BOOK3S_64 extern char __end_soft_masked[]; +bool search_kernel_soft_mask_table(unsigned long addr); unsigned long search_kernel_restart_table(unsigned long addr); DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant); @@ -87,7 +88,7 @@ static inline bool is_implicit_soft_masked(struct pt_regs *regs) if (regs->nip >= (unsigned long)__end_soft_masked) return false; - return true; + return search_kernel_soft_mask_table(regs->nip); } static inline void srr_regs_clobbered(void) diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index c9c2c36c1f8f..116c1519728a 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -762,6 +762,13 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96) stringify_in_c(.long (_target) - . ;) \ stringify_in_c(.previous) +#define SOFT_MASK_TABLE(_start, _end) \ + stringify_in_c(.section __soft_mask_table,"a";)\ + stringify_in_c(.balign 8;) \ + stringify_in_c(.llong (_start);)\ + stringify_in_c(.llong (_end);) \ + stringify_in_c(.previous) + #define RESTART_TABLE(_start, _end, _target) \ stringify_in_c(.section __restart_table,"a";)\ stringify_in_c(.balign 8;) \ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index ecd07bf604c5..4aec59a77d4c 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -428,21 +428,31 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real) /* If coming from user, skip soft-mask tests. */ andi. r10,r12,MSR_PR - bne 2f + bne 3f /* -* Kernel code running below __end_soft_masked is implicitly -* soft-masked +* Kernel code running below __end_soft_masked may be +* implicitly soft-masked if it is within the regions +* in the soft mask table. */ LOAD_HANDLER(r10, __end_soft_masked) cmpld r11,r10 - + bge+1f + + /* SEARCH_SOFT_MASK_TABLE clobbers r9,r10,r12 */ + mtctr r12 + stw r9,PACA_EXGEN+EX_CCR(r13) + SEARCH_SOFT_MASK_TABLE + cmpdi r12,0 + mfctr r12 /* Restore r12 to SRR1 */ + lwz r9,PACA_EXGEN+EX_CCR(r13) + beq 1f /* Not in soft-mask table */ li r10,IMASK - blt-1f + b 2f /* In soft-mask table, always mask */ /* Test the soft mask state against our interrupt's bit */ - lbz r10,PACAIRQSOFTMASK(r13) -1: andi. r10,r10,IMASK +1: lbz r10,PACAIRQSOFTMASK(r13) +2: andi. r10,r10,IMASK /* Associate vector numbers with bits in paca->irq_happened */ .if IVEC == 0x500 || IVEC == 0xea0 li r10,PACA_IRQ_EE @@ -473,7 +483,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real) .if ISTACK andi. r10,r12,MSR_PR /* See if coming from user */ -2: mr r10,r1 /* Save r1 */ +3: mr r10,r1 /* Save r1 */ subir1,r1,INT_FRAME_SIZE/* alloc frame
[PATCH v3 3/9] powerpc/64e: remove implicit soft-masking and interrupt exit restart logic
The implicit soft-masking to speed up interrupt return was going to be used by 64e as well, but it has not been extensively tested on that platform and is not considered ready. It was intended to be disabled before merge. Disable it for now. Most of the restart code is common with 64s, so with more correctness and performance testing this could be re-enabled again by adding the extra soft-mask checks to interrupt handlers and flipping exit_must_hard_disable(). Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/interrupt.h | 33 arch/powerpc/kernel/exceptions-64e.S | 12 +- arch/powerpc/kernel/interrupt.c | 2 +- arch/powerpc/kernel/interrupt_64.S | 16 -- 4 files changed, 40 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index 8b4b1e84e110..f13c93b033c7 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -73,20 +73,34 @@ #include #include -#ifdef CONFIG_PPC64 +#ifdef CONFIG_PPC_BOOK3S_64 extern char __end_soft_masked[]; unsigned long search_kernel_restart_table(unsigned long addr); -#endif -#ifdef CONFIG_PPC_BOOK3S_64 DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant); +static inline bool is_implicit_soft_masked(struct pt_regs *regs) +{ + if (regs->msr & MSR_PR) + return false; + + if (regs->nip >= (unsigned long)__end_soft_masked) + return false; + + return true; +} + static inline void srr_regs_clobbered(void) { local_paca->srr_valid = 0; local_paca->hsrr_valid = 0; } #else +static inline bool is_implicit_soft_masked(struct pt_regs *regs) +{ + return false; +} + static inline void srr_regs_clobbered(void) { } @@ -150,11 +164,13 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup */ if (TRAP(regs) != INTERRUPT_PROGRAM) { CT_WARN_ON(ct_state() != CONTEXT_KERNEL); - BUG_ON(regs->nip < (unsigned long)__end_soft_masked); + BUG_ON(is_implicit_soft_masked(regs)); } +#ifdef CONFIG_PPC_BOOK3S /* Move this under a debugging check */ if (arch_irq_disabled_regs(regs)) BUG_ON(search_kernel_restart_table(regs->nip)); +#endif } #endif @@ -244,10 +260,9 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte local_paca->irq_soft_mask = IRQS_ALL_DISABLED; local_paca->irq_happened |= PACA_IRQ_HARD_DIS; - if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !(regs->msr & MSR_PR) && - regs->nip < (unsigned long)__end_soft_masked) { - // Kernel code running below __end_soft_masked is - // implicitly soft-masked. + if (is_implicit_soft_masked(regs)) { + // Adjust regs->softe soft implicit soft-mask, so + // arch_irq_disabled_regs(regs) behaves as expected. regs->softe = IRQS_ALL_DISABLED; } @@ -282,6 +297,7 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter */ #ifdef CONFIG_PPC64 +#ifdef CONFIG_PPC_BOOK3S if (arch_irq_disabled_regs(regs)) { unsigned long rst = search_kernel_restart_table(regs->nip); if (rst) @@ -289,7 +305,6 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter } #endif -#ifdef CONFIG_PPC64 if (nmi_disables_ftrace(regs)) this_cpu_set_ftrace_enabled(state->ftrace_enabled); diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index d634bfceed2c..1401787b0b93 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -342,17 +342,7 @@ ret_from_mc_except: #define PROLOG_ADDITION_MASKABLE_GEN(n) \ lbz r10,PACAIRQSOFTMASK(r13); /* are irqs soft-masked? */ \ andi. r10,r10,IRQS_DISABLED; /* yes -> go out of line */ \ - bne masked_interrupt_book3e_##n;\ - /* Kernel code below __end_soft_masked is implicitly masked */ \ - andi. r10,r11,MSR_PR; \ - bne 1f; /* user -> not masked */\ - std r14,PACA_EXGEN+EX_R14(r13); \ - LOAD_REG_IMMEDIATE_SYM(r14, r10, __end_soft_masked);\ - mfspr r10,SPRN_SRR0; \ - cmpld r10,r14;\ - ld r14,PACA_EXGEN+EX_R14(r13);
[PATCH v3 0/9] powerpc: fast interrupt exit bug and misc fixes
This is a bunch of fixes for powerpc next, mostly a nasty hole in fast interrupt exit code found by Sachin and some other bits along the way while looking at it. Since v2: - Fixed 64e patch 3 to really set exit_must_hard_disable. - Reworded some changelogs. Since v1: - Fixed a bisection compile error due to a fix incorrectly going to a later patch. - Fixed the "add a table of implicit soft-masked addresses" patch to include the low scv vector range as a soft-masked table entry. scv was the original reason for implicit soft masking, and the previous version breaks it. My stress testing was using an image without scv glibc unfortunately. - Fixed a bug with the same patch that would restore r12 with SRR1 rather than HSRR1 in the case of masked hypervisor interrupts after searching the soft-mask table. Again unfortunately my stress testing was in a guest so no HV interrupts. Thanks to Michael for noticing these issues. - Pulled in the hash page fault interrupt handler fix into this series to make the dependencies clear (well it's not exactly dependent but assertions introduced later make the existing bug crash more often). Thanks, Nick Nicholas Piggin (9): powerpc/64s: fix hash page fault interrupt handler powerpc/64e: fix CONFIG_RELOCATABLE build warnings powerpc/64e: remove implicit soft-masking and interrupt exit restart logic powerpc/64s: add a table of implicit soft-masked addresses powerpc/64s/interrupt: preserve regs->softe for NMI interrupts powerpc/64: enable MSR[EE] in irq replay pt_regs powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols powerpc/64s/interrupt: clean up interrupt return labels powerpc/64s: move ret_from_fork etc above __end_soft_masked arch/powerpc/include/asm/interrupt.h | 41 +--- arch/powerpc/include/asm/ppc_asm.h| 7 +++ arch/powerpc/kernel/exceptions-64e.S | 23 +++ arch/powerpc/kernel/exceptions-64s.S | 64 --- arch/powerpc/kernel/interrupt.c | 2 +- arch/powerpc/kernel/interrupt_64.S| 90 ++- arch/powerpc/kernel/irq.c | 1 + arch/powerpc/kernel/vmlinux.lds.S | 9 +++ arch/powerpc/lib/restart_table.c | 26 arch/powerpc/mm/book3s64/hash_utils.c | 24 --- 10 files changed, 212 insertions(+), 75 deletions(-) -- 2.23.0
[PATCH v3 2/9] powerpc/64e: fix CONFIG_RELOCATABLE build warnings
CONFIG_RELOCATABLE=y causes build warnings from unresolved relocations. Fix these by using TOC addressing for these cases. Commit 24d33ac5b8ff ("powerpc/64s: Make prom_init require RELOCATABLE") caused some 64e configs to select RELOCATABLE resulting in these warnings, but the underlying issue was already there. This passes basic qemu testing. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64e.S | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 22fcd95dd8dc..d634bfceed2c 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -912,8 +912,14 @@ kernel_dbg_exc: b interrupt_return .macro SEARCH_RESTART_TABLE +#ifdef CONFIG_RELOCATABLE + ld r11,PACATOC(r13) + ld r14,__start___restart_table@got(r11) + ld r15,__stop___restart_table@got(r11) +#else LOAD_REG_IMMEDIATE_SYM(r14, r11, __start___restart_table) LOAD_REG_IMMEDIATE_SYM(r15, r11, __stop___restart_table) +#endif 300: cmpdr14,r15 beq 302f @@ -1329,7 +1335,12 @@ a2_tlbinit_code_start: a2_tlbinit_after_linear_map: /* Now we branch the new virtual address mapped by this entry */ +#ifdef CONFIG_RELOCATABLE + ld r5,PACATOC(r13) + ld r3,1f@got(r5) +#else LOAD_REG_IMMEDIATE_SYM(r3, r5, 1f) +#endif mtctr r3 bctr -- 2.23.0
[PATCH v3 1/9] powerpc/64s: fix hash page fault interrupt handler
The early bad fault or key fault test in do_hash_fault() ends up calling into ___do_page_fault without having gone through an interrupt handler wrapper (except the initial _RAW one). This can end up calling local irq functions while the interrupt has not been reconciled, which will likely cause crashes and it trips up on a later patch that adds more assertions. pkey_exec_prot from selftests causes this path to be executed. There is no real reason to run the in_nmi() test should be performed before the key fault check. In fact if a perf interrupt in the hash fault code did a stack walk that was made to take a key fault somehow then running ___do_page_fault could possibly cause another hash fault causing problems. Move the in_nmi() test first, and then do everything else inside the regular interrupt handler function. Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers") Reported-by: Sachin Sant Tested-by: Sachin Sant Signed-off-by: Nicholas Piggin --- arch/powerpc/mm/book3s64/hash_utils.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index 96d9aa164007..ac5720371c0d 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -1522,8 +1522,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap, } EXPORT_SYMBOL_GPL(hash_page); -DECLARE_INTERRUPT_HANDLER_RET(__do_hash_fault); -DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault) +DECLARE_INTERRUPT_HANDLER(__do_hash_fault); +DEFINE_INTERRUPT_HANDLER(__do_hash_fault) { unsigned long ea = regs->dar; unsigned long dsisr = regs->dsisr; @@ -1533,6 +1533,11 @@ DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault) unsigned int region_id; long err; + if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_KEYFAULT))) { + hash__do_page_fault(regs); + return; + } + region_id = get_region_id(ea); if ((region_id == VMALLOC_REGION_ID) || (region_id == IO_REGION_ID)) mm = _mm; @@ -1571,9 +1576,10 @@ DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault) bad_page_fault(regs, SIGBUS); } err = 0; - } - return err; + } else if (err) { + hash__do_page_fault(regs); + } } /* @@ -1582,13 +1588,6 @@ DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault) */ DEFINE_INTERRUPT_HANDLER_RAW(do_hash_fault) { - unsigned long dsisr = regs->dsisr; - - if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_KEYFAULT))) { - hash__do_page_fault(regs); - return 0; - } - /* * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then * don't call hash_page, just fail the fault. This is required to @@ -1607,8 +1606,7 @@ DEFINE_INTERRUPT_HANDLER_RAW(do_hash_fault) return 0; } - if (__do_hash_fault(regs)) - hash__do_page_fault(regs); + __do_hash_fault(regs); return 0; } -- 2.23.0
[powerpc][5.13.0-next20210629] Kernel WARN crypto/testmgr.c:5653 during boot
While booting 5.13.0-next20210629 on a Power server, following warning is seen: [0.076955] DRBG: could not allocate digest TFM handle: hmac(sha512) [0.076960] alg: drbg: Failed to reset rng [0.076963] alg: drbg: Test 0 failed for drbg_nopr_hmac_sha512 [0.076967] [ cut here ] [0.076970] alg: self-tests for drbg_nopr_hmac_sha512 (stdrng) failed (rc=-22) [0.076977] WARNING: CPU: 10 PID: 153 at crypto/testmgr.c:5653 alg_test+0x484/0x860 [0.076989] Modules linked in: [0.076993] CPU: 10 PID: 153 Comm: cryptomgr_test Not tainted 5.13.0-next-20210629 #1 [0.076998] NIP: c063ea44 LR: c063ea40 CTR: c0730b40 [0.077003] REGS: ce7ff960 TRAP: 0700 Not tainted (5.13.0-next-20210629) [0.077007] MSR: 80029033 CR: 28008222 XER: 20040005 [0.077018] CFAR: c0150a00 IRQMASK: 0 [0.077018] GPR00: c063ea40 ce7ffc00 c29bc300 0042 [0.077018] GPR04: 7fff ce7ff8c0 ce7ff8b8 [0.077018] GPR08: 0009f98e c24f66f0 c24f66f0 c2876838 [0.077018] GPR12: 8000 c0001ec7a280 c018ce88 c3b300c0 [0.077018] GPR16: [0.077018] GPR20: c2a9ab20 c0fb5278 [0.077018] GPR24: c00010a85200 c0d61fa8 c00010a85280 0400 [0.077018] GPR28: c00010a85200 000c c2ccf230 ffea [0.077072] NIP [c063ea44] alg_test+0x484/0x860 [0.077077] LR [c063ea40] alg_test+0x480/0x860 [0.077082] Call Trace: [0.077085] [ce7ffc00] [c063ea40] alg_test+0x480/0x860 (unreliable) [0.077091] [ce7ffd70] [c063ca60] cryptomgr_test+0x40/0x70 [0.077097] [ce7ffda0] [c018d014] kthread+0x194/0x1a0 [0.077103] [ce7ffe10] [c000c750] ret_from_kernel_thread+0x5c/0x6c [0.077110] Instruction dump: [0.077113] 409e0298 3d220031 89292f56 2f89 409e0288 3c62fe63 7f45d378 7f84e378 [0.077121] 7fe6fb78 38633260 4bb11f5d 6000 <0fe0> e8010180 eb210138 7c0803a6 [0.077131] ---[ end trace a1cc3999f90f0962 ]--- [0.077585] iommu: Default domain type: Translated This new self test was introduced with commit 8833272d876e crypto: drbg - self test for HMAC(SHA-512) Thanks -Sachin