[PATCH kernel v2] KVM: PPC: Fix TCE handling for VFIO
The LoPAPR spec defines a guest visible IOMMU with a variable page size. Currently QEMU advertises 4K, 64K, 2M, 16MB pages, a Linux VM picks the biggest (16MB). In the case of a passed though PCI device, there is a hardware IOMMU which does not support all pages sizes from the above - P8 cannot do 2MB and P9 cannot do 16MB. So for each emulated 16M IOMMU page we may create several smaller mappings ("TCEs") in the hardware IOMMU. The code wrongly uses the emulated TCE index instead of hardware TCE index in error handling. The problem is easier to see on POWER8 with multi-level TCE tables (when only the first level is preallocated) as hash mode uses real mode TCE hypercalls handlers. The kernel starts using indirect tables when VMs get bigger than 128GB (depends on the max page order). The very first real mode hcall is going to fail with H_TOO_HARD as in the real mode we cannot allocate memory for TCEs (we can in the virtual mode) but on the way out the code attempts to clear hardware TCEs using emulated TCE indexes which corrupts random kernel memory because it_offset==1<<59 is subtracted from those indexes and the resulting index is out of the TCE table bounds. This fixes kvmppc_clear_tce() to use the correct TCE indexes. While at it, this fixes TCE cache invalidation which uses emulated TCE indexes instead of the hardware ones. This went unnoticed as 64bit DMA is used these days and VMs map all RAM in one go and only then do DMA and this is when the TCE cache gets populated. Potentially this could slow down mapping, however normally 16MB emulated pages are backed by 64K hardware pages so it is one write to the "TCE Kill" per 256 updates which is not that bad considering the size of the cache (1024 TCEs or so). Fixes: ca1fc489cfa0 ("KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages") Reviewed-by: Frederic Barrat Reviewed-by: David Gibson Tested-by: David Gibson Signed-off-by: Alexey Kardashevskiy --- Changes: v2: * reworded the first paragraph of the commit log --- arch/powerpc/kvm/book3s_64_vio.c| 45 +++-- arch/powerpc/kvm/book3s_64_vio_hv.c | 44 ++-- 2 files changed, 45 insertions(+), 44 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index d42b4b6d4a79..85cfa6328222 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -420,13 +420,19 @@ static void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt, tbl[idx % TCES_PER_PAGE] = tce; } -static void kvmppc_clear_tce(struct mm_struct *mm, struct iommu_table *tbl, - unsigned long entry) +static void kvmppc_clear_tce(struct mm_struct *mm, struct kvmppc_spapr_tce_table *stt, + struct iommu_table *tbl, unsigned long entry) { - unsigned long hpa = 0; - enum dma_data_direction dir = DMA_NONE; + unsigned long i; + unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift); + unsigned long io_entry = entry << (stt->page_shift - tbl->it_page_shift); - iommu_tce_xchg_no_kill(mm, tbl, entry, , ); + for (i = 0; i < subpages; ++i) { + unsigned long hpa = 0; + enum dma_data_direction dir = DMA_NONE; + + iommu_tce_xchg_no_kill(mm, tbl, io_entry + i, , ); + } } static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm, @@ -485,6 +491,8 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm, break; } + iommu_tce_kill(tbl, io_entry, subpages); + return ret; } @@ -544,6 +552,8 @@ static long kvmppc_tce_iommu_map(struct kvm *kvm, break; } + iommu_tce_kill(tbl, io_entry, subpages); + return ret; } @@ -590,10 +600,9 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl, entry, ua, dir); - iommu_tce_kill(stit->tbl, entry, 1); if (ret != H_SUCCESS) { - kvmppc_clear_tce(vcpu->kvm->mm, stit->tbl, entry); + kvmppc_clear_tce(vcpu->kvm->mm, stt, stit->tbl, entry); goto unlock_exit; } } @@ -669,13 +678,13 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, */ if (get_user(tce, tces + i)) { ret = H_TOO_HARD; - goto invalidate_exit; + goto unlock_exit; } tce = be64_to_cpu(tce); if (kvmppc_tce_to_ua(vcpu->kvm, tce, )) { ret = H_PARAMETER; - goto invalidate_exit; + goto unlock_exit; } list_for_each_entry_lockless(stit, >iommu_tables, next) { @@ -684,19
Re: [PATCH 3/3] KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
On Tue, 2022-04-19 at 15:45 +, Sean Christopherson wrote: > On Tue, Apr 19, 2022, Maxim Levitsky wrote: > > On Fri, 2022-04-15 at 00:43 +, Sean Christopherson wrote: > > > Add wrappers to acquire/release KVM's SRCU lock when stashing the index > > > in vcpu->src_idx, along with rudimentary detection of illegal usage, > > > e.g. re-acquiring SRCU and thus overwriting vcpu->src_idx. Because the > > > SRCU index is (currently) either 0 or 1, illegal nesting bugs can go > > > unnoticed for quite some time and only cause problems when the nested > > > lock happens to get a different index. > > > > > > Wrap the WARNs in PROVE_RCU=y, and make them ONCE, otherwise KVM will > > > likely yell so loudly that it will bring the kernel to its knees. > > > > > > Signed-off-by: Sean Christopherson > > > --- > > ... > > > Looks good to me overall. > > > > Note that there are still places that acquire the lock and store the idx > > into > > a local variable, for example kvm_xen_vcpu_set_attr and such. > > I didn't check yet if these should be converted as well. > > Using a local variable is ok, even desirable. Nested/multiple readers is not > an > issue, the bug fixed by patch 1 is purely that kvm_vcpu.srcu_idx gets > corrupted. Makes sense. I still recal *that* bug in AVIC inhibition where that srcu lock was a major PITA, but now I remember that it was not due to nesting of the lock, but rather fact that we attempted to call syncronize_srcu or something like that with it held. > > In an ideal world, KVM would _only_ track the SRCU index in local variables, > but > that would require plumbing the local variable down into vcpu_enter_guest() > and > kvm_vcpu_block() so that SRCU can be unlocked prior to entering the guest or > scheduling out the vCPU. > It all makes sense now - thanks. Best regards, Maxim Levistky
[PATCH] ASoC: fsl_asrc: using pm_runtime_resume_and_get to simplify the code
From: Minghao Chi Using pm_runtime_resume_and_get() to replace pm_runtime_get_sync and pm_runtime_put_noidle. This change is just to simplify the code, no actual functional changes. Reported-by: Zeal Robot Signed-off-by: Minghao Chi --- sound/soc/fsl/fsl_asrc.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c index d7d1536a4f37..31a7f9aac6e3 100644 --- a/sound/soc/fsl/fsl_asrc.c +++ b/sound/soc/fsl/fsl_asrc.c @@ -1211,11 +1211,9 @@ static int fsl_asrc_probe(struct platform_device *pdev) goto err_pm_disable; } - ret = pm_runtime_get_sync(>dev); - if (ret < 0) { - pm_runtime_put_noidle(>dev); + ret = pm_runtime_resume_and_get(>dev); + if (ret < 0) goto err_pm_get_sync; - } ret = fsl_asrc_init(asrc); if (ret) { -- 2.25.1
Re: [PATCH v1 1/1] powerpc/83xx/mpc8349emitx: Get rid of of_node assignment
On Wed, Apr 6, 2022 at 3:02 PM Andy Shevchenko wrote: > On Mon, Mar 28, 2022 at 03:16:08PM +0200, Linus Walleij wrote: > > On Wed, Mar 23, 2022 at 6:43 PM Andy Shevchenko > > wrote: > > > > > Let GPIO library to assign of_node from the parent device. > > > This allows to move GPIO library and drivers to use fwnode > > > APIs instead of being stuck with OF-only interfaces. > > > > > > Signed-off-by: Andy Shevchenko > > > > That's a nice patch. > > Reviewed-by: Linus Walleij > > Thanks! > > Can we have this applied now? I think Michael Ellerman could help with this? Michael? Yours, Linus Walleij
Apply d799769188529abc6cbf035a10087a51f7832b6b to 5.17 and 5.15?
Hi Greg, Sasha, and Michael, Commit d79976918852 ("powerpc/64: Add UADDR64 relocation support") fixes a boot failure with CONFIG_RELOCATABLE=y kernels linked with recent versions of ld.lld [1]. Additionally, it resolves a separate boot failure that Paul Menzel reported [2] with ld.lld 13.0.0. Is this a reasonable backport for 5.17 and 5.15? It applies cleanly, resolves both problems, and does not appear to cause any other issues in my testing for both trees but I was curious what Michael's opinion was, as I am far from a PowerPC expert. This change does apply cleanly to 5.10 (I did not try earlier branches) but there are other changes needed for ld.lld to link CONFIG_RELOCATABLE kernels in that branch so to avoid any regressions, I think it is safe to just focus on 5.15 and 5.17. Paul, it would not hurt to confirm the results of my testing with your setup, just to make sure I did not miss anything :) [1]: https://github.com/ClangBuiltLinux/linux/issues/1581 [2]: https://lore.kernel.org/Yg2h2Q2vXFkkLGTh@dev-arch.archlinux-ax161/ Cheers, Nathan
Re: [PATCH 3/3] KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
Sean Christopherson writes: > Add wrappers to acquire/release KVM's SRCU lock when stashing the index > in vcpu->src_idx, along with rudimentary detection of illegal usage, > e.g. re-acquiring SRCU and thus overwriting vcpu->src_idx. Because the > SRCU index is (currently) either 0 or 1, illegal nesting bugs can go > unnoticed for quite some time and only cause problems when the nested > lock happens to get a different index. > > Wrap the WARNs in PROVE_RCU=y, and make them ONCE, otherwise KVM will > likely yell so loudly that it will bring the kernel to its knees. > > Signed-off-by: Sean Christopherson For the powerpc part: Tested-by: Fabiano Rosas > --- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 + > arch/powerpc/kvm/book3s_hv_nested.c| 16 +++ > arch/powerpc/kvm/book3s_rtas.c | 4 ++-- > arch/powerpc/kvm/powerpc.c | 4 ++-- > arch/riscv/kvm/vcpu.c | 16 +++ > arch/riscv/kvm/vcpu_exit.c | 4 ++-- > arch/s390/kvm/interrupt.c | 4 ++-- > arch/s390/kvm/kvm-s390.c | 8 > arch/s390/kvm/vsie.c | 4 ++-- > arch/x86/kvm/x86.c | 28 -- > include/linux/kvm_host.h | 24 +- > 11 files changed, 71 insertions(+), 50 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c > b/arch/powerpc/kvm/book3s_64_mmu_radix.c > index e4ce2a35483f..42851c32ff3b 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > @@ -168,9 +168,10 @@ int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, > gva_t eaddr, > return -EINVAL; > /* Read the entry from guest memory */ > addr = base + (index * sizeof(rpte)); > - vcpu->srcu_idx = srcu_read_lock(>srcu); > + > + kvm_vcpu_srcu_read_lock(vcpu); > ret = kvm_read_guest(kvm, addr, , sizeof(rpte)); > - srcu_read_unlock(>srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (ret) { > if (pte_ret_p) > *pte_ret_p = addr; > @@ -246,9 +247,9 @@ int kvmppc_mmu_radix_translate_table(struct kvm_vcpu > *vcpu, gva_t eaddr, > > /* Read the table to find the root of the radix tree */ > ptbl = (table & PRTB_MASK) + (table_index * sizeof(entry)); > - vcpu->srcu_idx = srcu_read_lock(>srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > ret = kvm_read_guest(kvm, ptbl, , sizeof(entry)); > - srcu_read_unlock(>srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (ret) > return ret; > > diff --git a/arch/powerpc/kvm/book3s_hv_nested.c > b/arch/powerpc/kvm/book3s_hv_nested.c > index 9d373f8963ee..c943a051c6e7 100644 > --- a/arch/powerpc/kvm/book3s_hv_nested.c > +++ b/arch/powerpc/kvm/book3s_hv_nested.c > @@ -306,10 +306,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) > /* copy parameters in */ > hv_ptr = kvmppc_get_gpr(vcpu, 4); > regs_ptr = kvmppc_get_gpr(vcpu, 5); > - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > err = kvmhv_read_guest_state_and_regs(vcpu, _hv, _regs, > hv_ptr, regs_ptr); > - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (err) > return H_PARAMETER; > > @@ -410,10 +410,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) > byteswap_hv_regs(_hv); > byteswap_pt_regs(_regs); > } > - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > err = kvmhv_write_guest_state_and_regs(vcpu, _hv, _regs, > hv_ptr, regs_ptr); > - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (err) > return H_AUTHORITY; > > @@ -600,16 +600,16 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu > *vcpu) > goto not_found; > > /* Write what was loaded into our buffer back to the L1 guest */ > - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > rc = kvm_vcpu_write_guest(vcpu, gp_to, buf, n); > - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (rc) > goto not_found; > } else { > /* Load the data to be stored from the L1 guest into our buf */ > - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > rc = kvm_vcpu_read_guest(vcpu, gp_from, buf, n); > - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); > +
Re: [PATCH RFC 2/8] arm64: stacktrace: Add arch_within_stack_frames
Hi, On Mon, Apr 18, 2022 at 09:22:11PM +0800, He Zhe wrote: > This function checks if the given address range crosses frame boundary. I don't think that's quite true, becuase arm64's procedure call standard (AAPCS64) doesn't give us enough information to determine this without additional metadata from the compiler, which we simply don't have today. Since there's a lot of confusion in this area, I've made a bit of an info dump below, before review on the patch itself, but TBH I'm struggling to see that this is all that useful. On arm64, we use a calling convention called AAPCS64, (in full: "Procedure Call Standard for the Arm® 64-bit Architecture (AArch64)"). That's maintained at: https://github.com/ARM-software/abi-aa ... with the latest release (as of today) at: https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst https://github.com/ARM-software/abi-aa/releases/download/2022Q1/aapcs64.pdf In AAPCS64, there are two related but distinct things to be aware of: * The "stack frame" of a function, which is the entire contiguous region of stack memory used by a function. * The "frame record", which is the saved FP and LR placed *somewhere* within the function's stack frame. The FP points at the most recent frame record on the stack, and at function call boundaries points at the caller's frame record. AAPCS64 doesn't say *where* a frame record is placed within a stack frame, and there are reasons for compilers to place above and below it. So in genral, a functionss stack frame looks like: +=+ | above | |-| | FP | LR | |-| | below | +=+ ... where the "above" or "below" portions might be any size (even 0 bytes). Typical code generation today means for most functions that the "below" portion is 0 bytes in size, but this is not guaranteed, and even today there are cases where this is not true. When one function calls another without a stack transition, that looks like: +=+ ___ | above |\ |-|| ,->| FP | LR |+-- Caller's stack frame | |-|| | | below | ___/ | +=+ ___ | | above |\ | |-|| '--| FP | LR |+-- Callee's stack frame |-|| | below | ___/ +=+ Where there's a stack transition, and the new stack is at a *lower* VA than the old stack, that looks like: +=+ ___ | above |\ |-|| ,->| FP | LR |+-- Caller's stack frame | |-|| | | below | ___/ | +=+ | | ~~~ | Arbitrarily | large gap, | potentially | including | other data | ~~~ | | +=+ ___ | | above |\ | |-|| '--| FP | LR |+-- Callee's stack frame |-|| | below | ___/ +=+ Where there's a stack transition, and the new stack is at a *higher* VA than the old stack, that looks like: +=+ ___ | above |\ |-|| ,--| FP | LR |+-- Callee's stack frame | |-|| | | below | ___/ | +=+ | | ~~~ | Arbitrarily | large gap, | potentially | including | other data | ~~~ | | +=+ ___ | | above |\ | |-|| '->| FP | LR |+-- Caller's stack frame |-|| | below | ___/ +=+ In all of these cases, we *cannot* identify the boundary between the two stack frames, we can *only* identify where something overlaps a frame record. That might itself be a good thing, but it's not the same thing as what you describe in the commit message. > It is based on the existing x86 algorithm, but implemented via stacktrace. > This can be tested by USERCOPY_STACK_FRAME_FROM and > USERCOPY_STACK_FRAME_TO in lkdtm. Can you please explain *why* we'd want this? Who do we expect to use this? What's the overhead in practice? Has this passed a more realistic stress test (e.g. running some userspace applications which make intensive use of copies to/from the kernel)? > > Signed-off-by: He Zhe > --- > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/thread_info.h | 12 + > arch/arm64/kernel/stacktrace.c | 76 ++-- > 3 files changed, 85 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 57c4c995965f..0f52a83d7771 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -165,6 +165,7 @@ config ARM64 > select HAVE_ARCH_TRACEHOOK > select HAVE_ARCH_TRANSPARENT_HUGEPAGE > select HAVE_ARCH_VMAP_STACK > + select
Re: [PATCH RFC 1/8] stacktrace: Change callback prototype to pass more information
On 4/19/22 21:09, Mark Rutland wrote: > On Mon, Apr 18, 2022 at 09:22:10PM +0800, He Zhe wrote: >> Currently stack_trace_consume_fn can only have pc of each frame of the >> stack. Copying-beyond-the-frame-detection also needs fp of current and >> previous frame. Other detection algorithm in the future may need more >> information of the frame. >> >> We define a frame_info to include them all. >> >> >> Signed-off-by: He Zhe >> --- >> include/linux/stacktrace.h | 9 - >> kernel/stacktrace.c| 10 +- >> 2 files changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h >> index 97455880ac41..5a61bfafe6f0 100644 >> --- a/include/linux/stacktrace.h >> +++ b/include/linux/stacktrace.h >> @@ -10,15 +10,22 @@ struct pt_regs; >> >> #ifdef CONFIG_ARCH_STACKWALK >> >> +struct frame_info { >> +unsigned long pc; >> +unsigned long fp; >> +unsigned long prev_fp; >> +}; > I don't think this should be exposed through a generic interface; the `fp` and > `prev_fp` values are only meaningful with arch-specific knowledge, and they're > *very* easy to misuse (e.g. when transitioning from one stack to another). > There's also a bunch of other information one may or may not want, depending > on > what you're trying to achieve. > > I am happy to have an arch-specific internal unwinder where we can access this > information, and *maybe* it makes sense to have a generic API that passes some > opaque token, but I don't think we should make the structure generic. Thanks for thoughts. I saw unwind_frame and etc was made private earlier and took that as a hint that all further stack walk things should be based on those functions and came up with this. But OK, good to know that arch-specific unwind would be fine, I'll redo this series in that way. Thanks, Zhe > > Thanks, > Mark. > >> + >> /** >> * stack_trace_consume_fn - Callback for arch_stack_walk() >> * @cookie: Caller supplied pointer handed back by arch_stack_walk() >> * @addr: The stack entry address to consume >> + * @fi: The frame information to consume >> * >> * Return: True, if the entry was consumed or skipped >> * False, if there is no space left to store >> */ >> -typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr); >> +typedef bool (*stack_trace_consume_fn)(void *cookie, struct frame_info *fi); >> /** >> * arch_stack_walk - Architecture specific function to walk the stack >> * @consume_entry: Callback which is invoked by the architecture code for >> diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c >> index 9ed5ce989415..2d0a2812e92b 100644 >> --- a/kernel/stacktrace.c >> +++ b/kernel/stacktrace.c >> @@ -79,7 +79,7 @@ struct stacktrace_cookie { >> unsigned intlen; >> }; >> >> -static bool stack_trace_consume_entry(void *cookie, unsigned long addr) >> +static bool stack_trace_consume_entry(void *cookie, struct frame_info *fi) >> { >> struct stacktrace_cookie *c = cookie; >> >> @@ -90,15 +90,15 @@ static bool stack_trace_consume_entry(void *cookie, >> unsigned long addr) >> c->skip--; >> return true; >> } >> -c->store[c->len++] = addr; >> +c->store[c->len++] = fi->pc; >> return c->len < c->size; >> } >> >> -static bool stack_trace_consume_entry_nosched(void *cookie, unsigned long >> addr) >> +static bool stack_trace_consume_entry_nosched(void *cookie, struct >> frame_info *fi) >> { >> -if (in_sched_functions(addr)) >> +if (in_sched_functions(fi->pc)) >> return true; >> -return stack_trace_consume_entry(cookie, addr); >> +return stack_trace_consume_entry(cookie, fi); >> } >> >> /** >> -- >> 2.25.1 >>
Re: [PATCH RFC 2/8] arm64: stacktrace: Add arch_within_stack_frames
On 4/19/22 05:59, Kees Cook wrote: > On Mon, Apr 18, 2022 at 09:22:11PM +0800, He Zhe wrote: >> This function checks if the given address range crosses frame boundary. >> It is based on the existing x86 algorithm, but implemented via stacktrace. >> This can be tested by USERCOPY_STACK_FRAME_FROM and >> USERCOPY_STACK_FRAME_TO in lkdtm. > Hi, > > Thanks for doing this implementation! One reason usercopy hardening > didn't persue doing a "full" stacktrace was because it seemed relatively > expensive. Did you do any usercopy-heavily workload testing to see if > there was a noticeable performance impact? > > It would be nice to block the exposure of canaries and PAC bits, though, > so I'm not opposed, but I'd like to get a better sense of how "heavy" > this might be. I just did some rough tests: hackbench -s 512 -l 200 -g 15 -f 25 -P Such line would hit arch_within_stack_frames at least 5000 times in my environment. With hardened_usercopy=on, the execution time would be around 2.121 seconds(average for 30 times) With hardened_usercopy=off, the execution time would be around 2.011 seconds(average for 30 times) I'll test the original x86 way for arm64 tomorrow. Any other workload needed to be run? Thanks, Zhe > > Thanks! > > -Kees >
Re: [PATCH v2 1/2] of: Create platform devices for OF framebuffers
Hi Am 19.04.22 um 15:30 schrieb Rob Herring: ... -#ifndef CONFIG_PPC static const struct of_device_id reserved_mem_matches[] = { { .compatible = "qcom,rmtfs-mem" }, { .compatible = "qcom,cmd-db" }, @@ -520,33 +519,81 @@ static const struct of_device_id reserved_mem_matches[] = { static int __init of_platform_default_populate_init(void) { - struct device_node *node; - As both if/else clauses need 'node', I'd keep this declared here. Ok. device_links_supplier_sync_state_pause(); if (!of_have_populated_dt()) return -ENODEV; - /* -* Handle certain compatibles explicitly, since we don't want to create -* platform_devices for every node in /reserved-memory with a -* "compatible", -*/ - for_each_matching_node(node, reserved_mem_matches) - of_platform_device_create(node, NULL, NULL); + if (IS_ENABLED(CONFIG_PPC)) { + struct device_node *boot_display = NULL; + struct device_node *node; + struct platform_device *dev; + int ret; + + /* Check if we have a MacOS display without a node spec */ + if (of_get_property(of_chosen, "linux,bootx-noscreen", NULL)) { + /* +* The old code tried to work out which node was the MacOS +* display based on the address. I'm dropping that since the +* lack of a node spec only happens with old BootX versions +* (users can update) and with this code, they'll still get +* a display (just not the palette hacks). +*/ + dev = platform_device_alloc("bootx-noscreen", 0); + if (WARN_ON(!dev)) + return -ENOMEM; + ret = platform_device_add(dev); + if (WARN_ON(ret)) { + platform_device_put(dev); + return ret; + } + } - node = of_find_node_by_path("/firmware"); - if (node) { - of_platform_populate(node, NULL, NULL, NULL); - of_node_put(node); - } + /* +* For OF framebuffers, first create the device for the boot display, +* then for the other framebuffers. Only fail for the boot display; +* ignore errors for the rest. +*/ + for_each_node_by_type(node, "display") { + if (!of_get_property(node, "linux,opened", NULL) || + !of_get_property(node, "linux,boot-display", NULL)) + continue; + dev = of_platform_device_create(node, "of-display", NULL); + if (WARN_ON(!dev)) + return -ENOMEM; + boot_display = node; + break; + } + for_each_node_by_type(node, "display") { + if (!of_get_property(node, "linux,opened", NULL) || node == boot_display) + continue; + of_platform_device_create(node, "of-display", NULL); + } - node = of_get_compatible_child(of_chosen, "simple-framebuffer"); - of_platform_device_create(node, NULL, NULL); - of_node_put(node); + } else { + struct device_node *node; + + /* +* Handle certain compatibles explicitly, since we don't want to create +* platform_devices for every node in /reserved-memory with a +* "compatible", +*/ + for_each_matching_node(node, reserved_mem_matches) + of_platform_device_create(node, NULL, NULL); - /* Populate everything else. */ - of_platform_default_populate(NULL, NULL, NULL); + node = of_find_node_by_path("/firmware"); + if (node) { + of_platform_populate(node, NULL, NULL, NULL); + of_node_put(node); + } + + node = of_get_compatible_child(of_chosen, "simple-framebuffer"); + of_platform_device_create(node, NULL, NULL); + of_node_put(node); In v1, you supported "simple-framebuffer" on PPC. Don't we want to allow that? Maybe no one cares ATM, but that could change. Either way: Support for these framebuffers has always been mutually exclusive. The offb driver, which originally contained the code, depends on CONFIG_PPC. And PPC never supported simple-framebuffer anywhere. Reviewed-by: Rob Herring Thank you. Best regards Thomas + + /* Populate everything else. */ + of_platform_default_populate(NULL, NULL,
Re: [PATCH v2 1/2] of: Create platform devices for OF framebuffers
On Tue, Apr 19, 2022 at 12:04:04PM +0200, Thomas Zimmermann wrote: > Create a platform device for each OF-declared framebuffer and have > offb bind to these devices. Allows for real hot-unplugging and other > drivers besides offb. > > Originally, offb created framebuffer devices while initializing its > module by parsing the OF device tree. No actual Linux device was set > up. This tied OF framebuffers to offb and makes writing other drivers > for the OF framebuffers complicated. The absence of a Linux device > further prevented real hot-unplugging. Adding a distinct platform > device for each OF framebuffer solves both problems. Specifically, a > DRM driver can now provide graphics output for modern userspace. > > Some of the offb init code is now located in the OF initialization. > There's now also an implementation of of_platform_default_populate_init(), > which was missing before. The OF side creates different devices for > either OF display nodes or BootX displays as they require different > handling by the driver. The offb drivers picks up each type of device > and runs the appropriate fbdev initialization. > > Tested with OF display nodes on qemu's ppc64le target. > > v2: > * run PPC code as part of existing initialization (Rob) > * add a few more error warnings (Javier) > > Signed-off-by: Thomas Zimmermann > Reviewed-by: Javier Martinez Canillas > --- > drivers/of/platform.c | 88 ++ > drivers/video/fbdev/offb.c | 98 +- > 2 files changed, 132 insertions(+), 54 deletions(-) > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > index a16b74f32aa9..738ba2e2838c 100644 > --- a/drivers/of/platform.c > +++ b/drivers/of/platform.c > @@ -507,7 +507,6 @@ int of_platform_default_populate(struct device_node *root, > } > EXPORT_SYMBOL_GPL(of_platform_default_populate); > > -#ifndef CONFIG_PPC > static const struct of_device_id reserved_mem_matches[] = { > { .compatible = "qcom,rmtfs-mem" }, > { .compatible = "qcom,cmd-db" }, > @@ -520,33 +519,81 @@ static const struct of_device_id reserved_mem_matches[] > = { > > static int __init of_platform_default_populate_init(void) > { > - struct device_node *node; > - As both if/else clauses need 'node', I'd keep this declared here. > device_links_supplier_sync_state_pause(); > > if (!of_have_populated_dt()) > return -ENODEV; > > - /* > - * Handle certain compatibles explicitly, since we don't want to create > - * platform_devices for every node in /reserved-memory with a > - * "compatible", > - */ > - for_each_matching_node(node, reserved_mem_matches) > - of_platform_device_create(node, NULL, NULL); > + if (IS_ENABLED(CONFIG_PPC)) { > + struct device_node *boot_display = NULL; > + struct device_node *node; > + struct platform_device *dev; > + int ret; > + > + /* Check if we have a MacOS display without a node spec */ > + if (of_get_property(of_chosen, "linux,bootx-noscreen", NULL)) { > + /* > + * The old code tried to work out which node was the > MacOS > + * display based on the address. I'm dropping that > since the > + * lack of a node spec only happens with old BootX > versions > + * (users can update) and with this code, they'll still > get > + * a display (just not the palette hacks). > + */ > + dev = platform_device_alloc("bootx-noscreen", 0); > + if (WARN_ON(!dev)) > + return -ENOMEM; > + ret = platform_device_add(dev); > + if (WARN_ON(ret)) { > + platform_device_put(dev); > + return ret; > + } > + } > > - node = of_find_node_by_path("/firmware"); > - if (node) { > - of_platform_populate(node, NULL, NULL, NULL); > - of_node_put(node); > - } > + /* > + * For OF framebuffers, first create the device for the boot > display, > + * then for the other framebuffers. Only fail for the boot > display; > + * ignore errors for the rest. > + */ > + for_each_node_by_type(node, "display") { > + if (!of_get_property(node, "linux,opened", NULL) || > + !of_get_property(node, "linux,boot-display", NULL)) > + continue; > + dev = of_platform_device_create(node, "of-display", > NULL); > + if (WARN_ON(!dev)) > + return -ENOMEM; > + boot_display = node; > + break; > +
Re: [PATCH RFC 1/8] stacktrace: Change callback prototype to pass more information
On Mon, Apr 18, 2022 at 09:22:10PM +0800, He Zhe wrote: > Currently stack_trace_consume_fn can only have pc of each frame of the > stack. Copying-beyond-the-frame-detection also needs fp of current and > previous frame. Other detection algorithm in the future may need more > information of the frame. > > We define a frame_info to include them all. > > > Signed-off-by: He Zhe > --- > include/linux/stacktrace.h | 9 - > kernel/stacktrace.c| 10 +- > 2 files changed, 13 insertions(+), 6 deletions(-) > > diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h > index 97455880ac41..5a61bfafe6f0 100644 > --- a/include/linux/stacktrace.h > +++ b/include/linux/stacktrace.h > @@ -10,15 +10,22 @@ struct pt_regs; > > #ifdef CONFIG_ARCH_STACKWALK > > +struct frame_info { > + unsigned long pc; > + unsigned long fp; > + unsigned long prev_fp; > +}; I don't think this should be exposed through a generic interface; the `fp` and `prev_fp` values are only meaningful with arch-specific knowledge, and they're *very* easy to misuse (e.g. when transitioning from one stack to another). There's also a bunch of other information one may or may not want, depending on what you're trying to achieve. I am happy to have an arch-specific internal unwinder where we can access this information, and *maybe* it makes sense to have a generic API that passes some opaque token, but I don't think we should make the structure generic. Thanks, Mark. > + > /** > * stack_trace_consume_fn - Callback for arch_stack_walk() > * @cookie: Caller supplied pointer handed back by arch_stack_walk() > * @addr:The stack entry address to consume > + * @fi: The frame information to consume > * > * Return: True, if the entry was consumed or skipped > * False, if there is no space left to store > */ > -typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr); > +typedef bool (*stack_trace_consume_fn)(void *cookie, struct frame_info *fi); > /** > * arch_stack_walk - Architecture specific function to walk the stack > * @consume_entry: Callback which is invoked by the architecture code for > diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c > index 9ed5ce989415..2d0a2812e92b 100644 > --- a/kernel/stacktrace.c > +++ b/kernel/stacktrace.c > @@ -79,7 +79,7 @@ struct stacktrace_cookie { > unsigned intlen; > }; > > -static bool stack_trace_consume_entry(void *cookie, unsigned long addr) > +static bool stack_trace_consume_entry(void *cookie, struct frame_info *fi) > { > struct stacktrace_cookie *c = cookie; > > @@ -90,15 +90,15 @@ static bool stack_trace_consume_entry(void *cookie, > unsigned long addr) > c->skip--; > return true; > } > - c->store[c->len++] = addr; > + c->store[c->len++] = fi->pc; > return c->len < c->size; > } > > -static bool stack_trace_consume_entry_nosched(void *cookie, unsigned long > addr) > +static bool stack_trace_consume_entry_nosched(void *cookie, struct > frame_info *fi) > { > - if (in_sched_functions(addr)) > + if (in_sched_functions(fi->pc)) > return true; > - return stack_trace_consume_entry(cookie, addr); > + return stack_trace_consume_entry(cookie, fi); > } > > /** > -- > 2.25.1 >
Re: [PATCH v2 3/8] x86/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
On 29.03.22 18:43, David Hildenbrand wrote: > Let's use bit 3 to remember PG_anon_exclusive in swap ptes. > > Signed-off-by: David Hildenbrand > --- Looks like I ignored that 32bit uses a different (undocumented) swap layout and bit 3 falls into the swp type. We'll restrict this to x86-64 for now, just like for the other architectures. The following seems to fly. @Andrew, let me know if you prefer a v3. >From bafb5ba914e89ad20c46f4e841a36909e610b81e Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Wed, 9 Mar 2022 09:47:29 +0100 Subject: [PATCH] x86/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on x86_64 Let's use bit 3 to remember PG_anon_exclusive in swap ptes. Signed-off-by: David Hildenbrand --- arch/x86/include/asm/pgtable.h | 17 + arch/x86/include/asm/pgtable_64.h | 4 +++- arch/x86/include/asm/pgtable_64_types.h | 5 + 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 62ab07e24aef..a1c555abed26 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1291,6 +1291,23 @@ static inline void update_mmu_cache_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pud) { } +#ifdef _PAGE_SWP_EXCLUSIVE +#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE +static inline pte_t pte_swp_mkexclusive(pte_t pte) +{ + return pte_set_flags(pte, _PAGE_SWP_EXCLUSIVE); +} + +static inline int pte_swp_exclusive(pte_t pte) +{ + return pte_flags(pte) & _PAGE_SWP_EXCLUSIVE; +} + +static inline pte_t pte_swp_clear_exclusive(pte_t pte) +{ + return pte_clear_flags(pte, _PAGE_SWP_EXCLUSIVE); +} +#endif /* _PAGE_SWP_EXCLUSIVE */ #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY static inline pte_t pte_swp_mksoft_dirty(pte_t pte) diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 56d0399a0cd1..e479491da8d5 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -186,7 +186,7 @@ static inline void native_pgd_clear(pgd_t *pgd) * * | ...| 11| 10| 9|8|7|6|5| 4| 3|2| 1|0| <- bit number * | ...|SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names - * | TYPE (59-63) | ~OFFSET (9-58) |0|0|X|X| X| X|F|SD|0| <- swp entry + * | TYPE (59-63) | ~OFFSET (9-58) |0|0|X|X| X| E|F|SD|0| <- swp entry * * G (8) is aliased and used as a PROT_NONE indicator for * !present ptes. We need to start storing swap entries above @@ -203,6 +203,8 @@ static inline void native_pgd_clear(pgd_t *pgd) * F (2) in swp entry is used to record when a pagetable is * writeprotected by userfaultfd WP support. * + * E (3) in swp entry is used to rememeber PG_anon_exclusive. + * * Bit 7 in swp entry should be 0 because pmd_present checks not only P, * but also L and G. * diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 91ac10654570..70e360a2e5fb 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -163,4 +163,9 @@ extern unsigned int ptrs_per_p4d; #define PGD_KERNEL_START ((PAGE_SIZE / 2) / sizeof(pgd_t)) +/* + * We borrow bit 3 to remember PG_anon_exclusive. + */ +#define _PAGE_SWP_EXCLUSIVE_PAGE_PWT + #endif /* _ASM_X86_PGTABLE_64_DEFS_H */ -- 2.35.1 -- Thanks, David / dhildenb
[PATCH V3 2/2] powerpc/perf: Fix the power10 event alternatives array to have correct sort order
When scheduling a group of events, there are constraint checks done to make sure all events can go in a group. Example, one of the criteria is that events in a group cannot use same PMC. But platform specific PMU supports alternative event for some of the event codes. During perf_event_open, if any event group doesn't match constraint check criteria, further lookup is done to find alternative event. By current design, the array of alternatives events in PMU code is expected to be sorted by column 0. This is because in find_alternative() function, the return criteria is based on event code comparison. ie "event < ev_alt[i][0])". This optimisation is there since find_alternative() can get called multiple times. In power10 PMU code, the alternative event array is not sorted list and hence there is breakage in finding alternative event. To work with existing logic, fix the alternative event array to be sorted by column 0 for power10-pmu.c Results: In case where an alternative event is not chosen when we could, events will be multiplexed. ie, time sliced where it could actually run concurrently. Example, in power10 PM_INST_CMPL_ALT(0x2) has alternative event, PM_INST_CMPL(0x500fa). Without the fix, if a group of events with PMC1 to PMC4 is used along with PM_INST_CMPL_ALT, it will be time sliced since all programmable PMC's are consumed already. But with the fix, when it picks alternative event on PMC5, all events will run concurrently. << Before Patch >> # perf stat -e r2,r100fc,r200fa,r300fc,r400fc ^C Performance counter stats for 'system wide': 328668935 r2 (79.94%) 56501024 r100fc (79.95%) 49564238 r200fa (79.95%) 376 r300fc (80.19%) 660 r400fc (79.97%) 4.039150522 seconds time elapsed With the fix, since alternative event is chosen to run on PMC6, events will be run concurrently. << After Patch >> # perf stat -e r2,r100fc,r200fa,r300fc,r400fc ^C Performance counter stats for 'system wide': 23596607 r2 4907738 r100fc 2283608 r200fa 135 r300fc 248 r400fc 1.664671390 seconds time elapsed Fixes: a64e697cef23 ("powerpc/perf: power10 Performance Monitoring support") Signed-off-by: Athira Rajeev Reviewed-by: Madhavan Srinivasan --- Changelog: v1 -> v2: Added Fixes tag and reworded commit message Added Reviewed-by from Maddy v2 -> v3: Added info about what is the breakage with current code. arch/powerpc/perf/power10-pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c index d3398100a60f..c6d51e7093cf 100644 --- a/arch/powerpc/perf/power10-pmu.c +++ b/arch/powerpc/perf/power10-pmu.c @@ -91,8 +91,8 @@ extern u64 PERF_REG_EXTENDED_MASK; /* Table of alternatives, sorted by column 0 */ static const unsigned int power10_event_alternatives[][MAX_ALT] = { - { PM_CYC_ALT, PM_CYC }, { PM_INST_CMPL_ALT, PM_INST_CMPL }, + { PM_CYC_ALT, PM_CYC }, }; static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) -- 2.35.1
[PATCH V3 1/2] powerpc/perf: Fix the power9 event alternatives array to have correct sort order
When scheduling a group of events, there are constraint checks done to make sure all events can go in a group. Example, one of the criteria is that events in a group cannot use same PMC. But platform specific PMU supports alternative event for some of the event codes. During perf_event_open, if any event group doesn't match constraint check criteria, further lookup is done to find alternative event. By current design, the array of alternatives events in PMU code is expected to be sorted by column 0. This is because in find_alternative() function, the return criteria is based on event code comparison. ie "event < ev_alt[i][0])". This optimisation is there since find_alternative() can get called multiple times. In power9 PMU code, the alternative event array is not sorted list and hence there is breakage in finding alternative event. To work with existing logic, fix the alternative event array to be sorted by column 0 for power9-pmu.c Results: With alternative events, multiplexing can be avoided. That is, for example, in power9 PM_LD_MISS_L1 (0x3e054) has alternative event, PM_LD_MISS_L1_ALT (0x400f0). This is an identical event which can be programmed in a different PMC. << Before patch >> # perf stat -e r3e054,r300fc ^C Performance counter stats for 'system wide': 1057860 r3e054 (50.21%) 379 r300fc (49.79%) 0.944329741 seconds time elapsed Since both the events are using PMC3 in this case, they are multiplexed here. <> # perf stat -e r3e054,r300fc ^C Performance counter stats for 'system wide': 1006948 r3e054 182 r300fc <<>> Fixes: 91e0bd1e6251 ("powerpc/perf: Add PM_LD_MISS_L1 and PM_BR_2PATH to power9 event list") Signed-off-by: Athira Rajeev Reviewed-by: Madhavan Srinivasan --- Changelog: v1 -> v2: Added Fixes tag and reworded commit message. Added Reviewed-by from Maddy. v2 -> v3: Added info about what is the breakage with current code. arch/powerpc/perf/power9-pmu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c index c9eb5232e68b..c393e837648e 100644 --- a/arch/powerpc/perf/power9-pmu.c +++ b/arch/powerpc/perf/power9-pmu.c @@ -133,11 +133,11 @@ int p9_dd22_bl_ev[] = { /* Table of alternatives, sorted by column 0 */ static const unsigned int power9_event_alternatives[][MAX_ALT] = { - { PM_INST_DISP, PM_INST_DISP_ALT }, - { PM_RUN_CYC_ALT, PM_RUN_CYC }, - { PM_RUN_INST_CMPL_ALT, PM_RUN_INST_CMPL }, - { PM_LD_MISS_L1,PM_LD_MISS_L1_ALT }, { PM_BR_2PATH, PM_BR_2PATH_ALT }, + { PM_INST_DISP, PM_INST_DISP_ALT }, + { PM_RUN_CYC_ALT, PM_RUN_CYC }, + { PM_LD_MISS_L1,PM_LD_MISS_L1_ALT }, + { PM_RUN_INST_CMPL_ALT, PM_RUN_INST_CMPL }, }; static int power9_get_alternatives(u64 event, unsigned int flags, u64 alt[]) -- 2.35.1
Re: [PATCH] misc: ocxl: fix possible double free in ocxl_file_register_afu
On 2022/4/19 17:09, Frederic Barrat wrote: On 18/04/2022 10:57, Hangyu Hua wrote: info_release() will be called in device_unregister() when info->dev's reference count is 0. So there is no need to call ocxl_afu_put() and kfree() again. Fix this by adding free_minor() and return to err_unregister error path. Fixes: 75ca758adbaf ("ocxl: Create a clear delineation between ocxl backend & frontend") Signed-off-by: Hangyu Hua --- Thanks for fixing that error path! I'm now thinking it would be cleaner to have the call to free_minor() in the info_release() callback but that would be another patch. In any case: Acked-by: Frederic Barrat Fred I think it is a good idea to use callbacks to handle all garbage collections. And free_minor is used only in ocxl_file_register_afu() andocxl_file_unregister_afu(). So this should only require minor changes. Thanks. drivers/misc/ocxl/file.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c index d881f5e40ad9..6777c419a8da 100644 --- a/drivers/misc/ocxl/file.c +++ b/drivers/misc/ocxl/file.c @@ -556,7 +556,9 @@ int ocxl_file_register_afu(struct ocxl_afu *afu) err_unregister: ocxl_sysfs_unregister_afu(info); // safe to call even if register failed + free_minor(info); device_unregister(>dev); + return rc; err_put: ocxl_afu_put(afu); free_minor(info);
Re: [PATCH 3/3] KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
On Fri, 2022-04-15 at 00:43 +, Sean Christopherson wrote: > Add wrappers to acquire/release KVM's SRCU lock when stashing the index > in vcpu->src_idx, along with rudimentary detection of illegal usage, > e.g. re-acquiring SRCU and thus overwriting vcpu->src_idx. Because the > SRCU index is (currently) either 0 or 1, illegal nesting bugs can go > unnoticed for quite some time and only cause problems when the nested > lock happens to get a different index. > > Wrap the WARNs in PROVE_RCU=y, and make them ONCE, otherwise KVM will > likely yell so loudly that it will bring the kernel to its knees. > > Signed-off-by: Sean Christopherson > --- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 + > arch/powerpc/kvm/book3s_hv_nested.c| 16 +++ > arch/powerpc/kvm/book3s_rtas.c | 4 ++-- > arch/powerpc/kvm/powerpc.c | 4 ++-- > arch/riscv/kvm/vcpu.c | 16 +++ > arch/riscv/kvm/vcpu_exit.c | 4 ++-- > arch/s390/kvm/interrupt.c | 4 ++-- > arch/s390/kvm/kvm-s390.c | 8 > arch/s390/kvm/vsie.c | 4 ++-- > arch/x86/kvm/x86.c | 28 -- > include/linux/kvm_host.h | 24 +- > 11 files changed, 71 insertions(+), 50 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c > b/arch/powerpc/kvm/book3s_64_mmu_radix.c > index e4ce2a35483f..42851c32ff3b 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > @@ -168,9 +168,10 @@ int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, > gva_t eaddr, > return -EINVAL; > /* Read the entry from guest memory */ > addr = base + (index * sizeof(rpte)); > - vcpu->srcu_idx = srcu_read_lock(>srcu); > + > + kvm_vcpu_srcu_read_lock(vcpu); > ret = kvm_read_guest(kvm, addr, , sizeof(rpte)); > - srcu_read_unlock(>srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (ret) { > if (pte_ret_p) > *pte_ret_p = addr; > @@ -246,9 +247,9 @@ int kvmppc_mmu_radix_translate_table(struct kvm_vcpu > *vcpu, gva_t eaddr, > > /* Read the table to find the root of the radix tree */ > ptbl = (table & PRTB_MASK) + (table_index * sizeof(entry)); > - vcpu->srcu_idx = srcu_read_lock(>srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > ret = kvm_read_guest(kvm, ptbl, , sizeof(entry)); > - srcu_read_unlock(>srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (ret) > return ret; > > diff --git a/arch/powerpc/kvm/book3s_hv_nested.c > b/arch/powerpc/kvm/book3s_hv_nested.c > index 9d373f8963ee..c943a051c6e7 100644 > --- a/arch/powerpc/kvm/book3s_hv_nested.c > +++ b/arch/powerpc/kvm/book3s_hv_nested.c > @@ -306,10 +306,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) > /* copy parameters in */ > hv_ptr = kvmppc_get_gpr(vcpu, 4); > regs_ptr = kvmppc_get_gpr(vcpu, 5); > - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > err = kvmhv_read_guest_state_and_regs(vcpu, _hv, _regs, > hv_ptr, regs_ptr); > - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (err) > return H_PARAMETER; > > @@ -410,10 +410,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) > byteswap_hv_regs(_hv); > byteswap_pt_regs(_regs); > } > - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > err = kvmhv_write_guest_state_and_regs(vcpu, _hv, _regs, > hv_ptr, regs_ptr); > - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (err) > return H_AUTHORITY; > > @@ -600,16 +600,16 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu > *vcpu) > goto not_found; > > /* Write what was loaded into our buffer back to the L1 guest */ > - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > rc = kvm_vcpu_write_guest(vcpu, gp_to, buf, n); > - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); > + kvm_vcpu_srcu_read_unlock(vcpu); > if (rc) > goto not_found; > } else { > /* Load the data to be stored from the L1 guest into our buf */ > - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); > + kvm_vcpu_srcu_read_lock(vcpu); > rc = kvm_vcpu_read_guest(vcpu, gp_from, buf, n); > - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); > +
Re: [PATCH 1/3] KVM: x86: Don't re-acquire SRCU lock in complete_emulated_io()
On Fri, 2022-04-15 at 00:43 +, Sean Christopherson wrote: > Don't re-acquire SRCU in complete_emulated_io() now that KVM acquires the > lock in kvm_arch_vcpu_ioctl_run(). More importantly, don't overwrite > vcpu->srcu_idx. If the index acquired by complete_emulated_io() differs > from the one acquired by kvm_arch_vcpu_ioctl_run(), KVM will effectively > leak a lock and hang if/when synchronize_srcu() is invoked for the > relevant grace period. > > Fixes: 8d25b7beca7e ("KVM: x86: pull kvm->srcu read-side to > kvm_arch_vcpu_ioctl_run") > Cc: sta...@vger.kernel.org > Signed-off-by: Sean Christopherson > --- > arch/x86/kvm/x86.c | 7 +-- > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index ab336f7c82e4..f35fe09de59d 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -10450,12 +10450,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu) > > static inline int complete_emulated_io(struct kvm_vcpu *vcpu) > { > - int r; > - > - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); > - r = kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE); > - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); > - return r; > + return kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE); > } > > static int complete_emulated_pio(struct kvm_vcpu *vcpu) I wonder how this did work Reviewed-by: Maxim Levitsky Best regards, Maxim Levitsky
Re: [PATCH net-next v4 14/18] sfc: Remove usage of list iterator for list_add() after the loop body
On Fri, Apr 15, 2022 at 02:29:43PM +0200, Jakob Koschel wrote: > In preparation to limit the scope of a list iterator to the list > traversal loop, use a dedicated pointer pointing to the location > where the element should be inserted [1]. > > Before, the code implicitly used the head when no element was found > when using >list. The new 'pos' variable is set to the list head > by default and overwritten if the list exits early, marking the > insertion point for list_add(). > > Link: > https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/ > [1] > Signed-off-by: Jakob Koschel > --- > drivers/net/ethernet/sfc/rx_common.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/sfc/rx_common.c > b/drivers/net/ethernet/sfc/rx_common.c > index 1b22c7be0088..716847ba7038 100644 > --- a/drivers/net/ethernet/sfc/rx_common.c > +++ b/drivers/net/ethernet/sfc/rx_common.c > @@ -556,6 +556,7 @@ efx_rx_packet_gro(struct efx_channel *channel, struct > efx_rx_buffer *rx_buf, > struct efx_rss_context *efx_alloc_rss_context_entry(struct efx_nic *efx) > { > struct list_head *head = >rss_context.list; > + struct list_head *pos = head; This violates the reverse Xmas list policy. This definition should be 1 line further down. Martin > struct efx_rss_context *ctx, *new; > u32 id = 1; /* Don't use zero, that refers to the master RSS context */ > > @@ -563,8 +564,10 @@ struct efx_rss_context > *efx_alloc_rss_context_entry(struct efx_nic *efx) > > /* Search for first gap in the numbering */ > list_for_each_entry(ctx, head, list) { > - if (ctx->user_id != id) > + if (ctx->user_id != id) { > + pos = >list; > break; > + } > id++; > /* Check for wrap. If this happens, we have nearly 2^32 >* allocated RSS contexts, which seems unlikely. > @@ -582,7 +585,7 @@ struct efx_rss_context > *efx_alloc_rss_context_entry(struct efx_nic *efx) > > /* Insert the new entry into the gap */ > new->user_id = id; > - list_add_tail(>list, >list); > + list_add_tail(>list, pos); > return new; > } > > -- > 2.25.1
[PATCH v2 0/2] of: Register platform device for each framebuffer
Move the detection of OF framebuffers from fbdev into of platform code and register a Linux platform device for each framebuffer. Allows for DRM-based OF drivers and real hot-unplugging of the framebuffer. This patchset has been tested with qemu's ppc64le emulation, which provides a framebuffer via OF display node. If someone has an older 32-bit system with BootX available, please test. v2: * integrate PPC code into generic platform setup (Rob) * keep !device workaround with a warning (Javier, Daniel) Thomas Zimmermann (2): of: Create platform devices for OF framebuffers fbdev: Warn in hot-unplug workaround for framebuffers without device drivers/of/platform.c| 88 +--- drivers/video/fbdev/core/fbmem.c | 10 ++-- drivers/video/fbdev/offb.c | 98 +--- 3 files changed, 136 insertions(+), 60 deletions(-) base-commit: d97978df553d768e457cb68c637b2b0a6188b87c prerequisite-patch-id: c2b2f08f0eccc9f5df0c0da49fa1d36267deb11d prerequisite-patch-id: c67e5d886a47b7d0266d81100837557fda34cb24 prerequisite-patch-id: 6e1032c6302461624f33194c8b8f37103a3fa6ef prerequisite-patch-id: 3f204510fcbf9530d6540bd8e6128cce598988b6 prerequisite-patch-id: ab7611d28d07723ab1dd392dcf9a6345de3b1040 -- 2.35.1
[PATCH v2 2/2] fbdev: Warn in hot-unplug workaround for framebuffers without device
A workaround makes fbdev hot-unplugging work for framebuffers without device. The only user for this feature was offb. As each OF framebuffer now has an associated platform device, the workaround hould no longer be triggered. Update it with a warning and rewrite the comment. Fbdev drivers that trigger the hot-unplug workaround really need to be fixed. Signed-off-by: Thomas Zimmermann Suggested-by: Javier Martinez Canillas --- drivers/video/fbdev/core/fbmem.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c index bc6ed750e915..84427470367b 100644 --- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -1577,14 +1577,12 @@ static void do_remove_conflicting_framebuffers(struct apertures_struct *a, * allocate the memory range. * * If it's not a platform device, at least print a warning. A -* fix would add code to remove the device from the system. +* fix would add code to remove the device from the system. For +* framebuffers without any Linux device, print a warning as +* well. */ if (!device) { - /* TODO: Represent each OF framebuffer as its own -* device in the device hierarchy. For now, offb -* doesn't have such a device, so unregister the -* framebuffer as before without warning. -*/ + pr_warn("fb%d: no device set\n", i); do_unregister_framebuffer(registered_fb[i]); } else if (dev_is_platform(device)) { registered_fb[i]->forced_out = true; -- 2.35.1
[PATCH v2 1/2] of: Create platform devices for OF framebuffers
Create a platform device for each OF-declared framebuffer and have offb bind to these devices. Allows for real hot-unplugging and other drivers besides offb. Originally, offb created framebuffer devices while initializing its module by parsing the OF device tree. No actual Linux device was set up. This tied OF framebuffers to offb and makes writing other drivers for the OF framebuffers complicated. The absence of a Linux device further prevented real hot-unplugging. Adding a distinct platform device for each OF framebuffer solves both problems. Specifically, a DRM driver can now provide graphics output for modern userspace. Some of the offb init code is now located in the OF initialization. There's now also an implementation of of_platform_default_populate_init(), which was missing before. The OF side creates different devices for either OF display nodes or BootX displays as they require different handling by the driver. The offb drivers picks up each type of device and runs the appropriate fbdev initialization. Tested with OF display nodes on qemu's ppc64le target. v2: * run PPC code as part of existing initialization (Rob) * add a few more error warnings (Javier) Signed-off-by: Thomas Zimmermann Reviewed-by: Javier Martinez Canillas --- drivers/of/platform.c | 88 ++ drivers/video/fbdev/offb.c | 98 +- 2 files changed, 132 insertions(+), 54 deletions(-) diff --git a/drivers/of/platform.c b/drivers/of/platform.c index a16b74f32aa9..738ba2e2838c 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -507,7 +507,6 @@ int of_platform_default_populate(struct device_node *root, } EXPORT_SYMBOL_GPL(of_platform_default_populate); -#ifndef CONFIG_PPC static const struct of_device_id reserved_mem_matches[] = { { .compatible = "qcom,rmtfs-mem" }, { .compatible = "qcom,cmd-db" }, @@ -520,33 +519,81 @@ static const struct of_device_id reserved_mem_matches[] = { static int __init of_platform_default_populate_init(void) { - struct device_node *node; - device_links_supplier_sync_state_pause(); if (!of_have_populated_dt()) return -ENODEV; - /* -* Handle certain compatibles explicitly, since we don't want to create -* platform_devices for every node in /reserved-memory with a -* "compatible", -*/ - for_each_matching_node(node, reserved_mem_matches) - of_platform_device_create(node, NULL, NULL); + if (IS_ENABLED(CONFIG_PPC)) { + struct device_node *boot_display = NULL; + struct device_node *node; + struct platform_device *dev; + int ret; + + /* Check if we have a MacOS display without a node spec */ + if (of_get_property(of_chosen, "linux,bootx-noscreen", NULL)) { + /* +* The old code tried to work out which node was the MacOS +* display based on the address. I'm dropping that since the +* lack of a node spec only happens with old BootX versions +* (users can update) and with this code, they'll still get +* a display (just not the palette hacks). +*/ + dev = platform_device_alloc("bootx-noscreen", 0); + if (WARN_ON(!dev)) + return -ENOMEM; + ret = platform_device_add(dev); + if (WARN_ON(ret)) { + platform_device_put(dev); + return ret; + } + } - node = of_find_node_by_path("/firmware"); - if (node) { - of_platform_populate(node, NULL, NULL, NULL); - of_node_put(node); - } + /* +* For OF framebuffers, first create the device for the boot display, +* then for the other framebuffers. Only fail for the boot display; +* ignore errors for the rest. +*/ + for_each_node_by_type(node, "display") { + if (!of_get_property(node, "linux,opened", NULL) || + !of_get_property(node, "linux,boot-display", NULL)) + continue; + dev = of_platform_device_create(node, "of-display", NULL); + if (WARN_ON(!dev)) + return -ENOMEM; + boot_display = node; + break; + } + for_each_node_by_type(node, "display") { + if (!of_get_property(node, "linux,opened", NULL) || node == boot_display) + continue; +
Re: [PATCH] misc: ocxl: fix possible double free in ocxl_file_register_afu
On 18/04/2022 10:57, Hangyu Hua wrote: info_release() will be called in device_unregister() when info->dev's reference count is 0. So there is no need to call ocxl_afu_put() and kfree() again. Fix this by adding free_minor() and return to err_unregister error path. Fixes: 75ca758adbaf ("ocxl: Create a clear delineation between ocxl backend & frontend") Signed-off-by: Hangyu Hua --- Thanks for fixing that error path! I'm now thinking it would be cleaner to have the call to free_minor() in the info_release() callback but that would be another patch. In any case: Acked-by: Frederic Barrat Fred drivers/misc/ocxl/file.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c index d881f5e40ad9..6777c419a8da 100644 --- a/drivers/misc/ocxl/file.c +++ b/drivers/misc/ocxl/file.c @@ -556,7 +556,9 @@ int ocxl_file_register_afu(struct ocxl_afu *afu) err_unregister: ocxl_sysfs_unregister_afu(info); // safe to call even if register failed + free_minor(info); device_unregister(>dev); + return rc; err_put: ocxl_afu_put(afu); free_minor(info);
Re: [RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load
On 14/04/22 22:02, Laurent Dufour wrote: On 11/04/2022, 10:43:56, Sourabh Jain wrote: Two major changes are done to enable the crash CPU hotplug handler. Firstly, updated the kexec load path to prepare kimage for hotplug changes, and secondly, implemented the crash hotplug handler. On the kexec load path, the memsz allocation of the crash FDT segment is updated to ensure that it has sufficient buffer space to accommodate future hot add CPUs. Initialized the kimage members to track the kexec FDT segment. The crash hotplug handler updates the cpus node of crash FDT. While we update crash FDT the kexec_crash_image is marked invalid and restored after FDT update to avoid race. Since memory crash hotplug support is not there yet the crash hotplug the handler simply warns the user and returns. Signed-off-by: Sourabh Jain --- arch/powerpc/kexec/core_64.c | 46 ++ arch/powerpc/kexec/elf_64.c | 74 2 files changed, 120 insertions(+) diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 249d2632526d..62f77cc86407 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -466,6 +466,52 @@ int update_cpus_node(void *fdt) return ret; } +#ifdef CONFIG_CRASH_HOTPLUG +/** + * arch_crash_hotplug_handler() - Handle hotplug FDT changes + * @image: the active struct kimage + * @hp_action: the hot un/plug action being handled + * @a: first parameter dependent upon hp_action + * @b: first parameter dependent upon hp_action + * + * To accurately reflect CPU hot un/plug changes, the FDT + * must be updated with the new list of CPUs and memories. + */ +void arch_crash_hotplug_handler(struct kimage *image, unsigned int hp_action, + unsigned long a, unsigned long b) +{ + void *fdt; + + /* No action needed for CPU hot-unplug */ + if (hp_action == KEXEC_CRASH_HP_REMOVE_CPU) + return; I should have report since in the previous version too, why nothing is done when CPU are removed? Since CPU note addresses are already available for all possible CPUs (regardless they are online or not) the PHDR is allocated for all possible CPUs during elfcorehdr creation. At least for the kexec_load system call. Now on the crash path, the crashing CPU initiates an IPI call to update the CPU data of all online CPUs and jumps to the purgatory. Hence no action is needed for the remove case. With the above logic, we shouldn't be taking any action for the CPU add case too, right? But on PowerPC early boot path there is validation that checks the boot CPU is part of the Flattened Device Tree (FDT) or not. If the boot CPU is not found in FDT the boot fails. Hence FDT needs to be updated for every new CPU added to the system but not needed when CPU is removed. Thanks Sourabh Jain + + /* crash update on memory hotplug is not support yet */ + if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) { + pr_info_once("crash hp: crash update is not supported with memory hotplug\n"); + return; + } + + /* Must have valid FDT index */ + if (!image->arch.fdt_index_valid) { + pr_err("crash hp: unable to locate FDT segment"); + return; + } + + fdt = __va((void *)image->segment[image->arch.fdt_index].mem); + + /* Temporarily invalidate the crash image while it is replaced */ + xchg(_crash_image, NULL); + + /* update FDT to refelect changes to CPU resrouces */ + if (update_cpus_node(fdt)) + pr_err("crash hp: failed to update crash FDT"); + + /* The crash image is now valid once again */ + xchg(_crash_image, image); +} +#endif /* CONFIG_CRASH_HOTPLUG */ + #ifdef CONFIG_PPC_64S_HASH_MMU /* Values we need to export to the second kernel via the device tree. */ static unsigned long htab_base; diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c index eeb258002d1e..9dc774548ce4 100644 --- a/arch/powerpc/kexec/elf_64.c +++ b/arch/powerpc/kexec/elf_64.c @@ -24,6 +24,67 @@ #include #include +#include +#include + +#ifdef CONFIG_CRASH_HOTPLUG + +/** + * get_cpu_node_sz() - Calculate the space needed to store a CPU device type node + * in FDT. The calculation is done based on the existing CPU + * node in unflatten device tree. Loop through all the + * properties of the very first CPU type device node found in + * unflatten device tree and returns the sum of the property + * length and property string size of all properties of a CPU + * node. + */ I don't think this is following the indent rules. +static int get_cpu_node_sz(void) { + struct device_node *dn = NULL; + struct property *pp; + int cpu_node_size = 0; + + dn
Re: [RFC v4 PATCH 3/5] powrepc/crash hp: update kimage_arch struct
On 14/04/22 22:05, Laurent Dufour wrote: On 11/04/2022, 10:43:55, Sourabh Jain wrote: Two new members fdt_index and fdt_index_valid are added in kimage_arch struct to track the FDT kexec segment. These new members of kimage_arch struct will help the crash hotplug handler to easily access the FDT segment from the kexec segment array. Otherwise, we have to loop through all kexec segments to find the FDT segments. Signed-off-by: Sourabh Jain --- arch/powerpc/include/asm/kexec.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index e1288826e22e..19c2cab6a880 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -104,6 +104,8 @@ extern const struct kexec_file_ops kexec_elf64_ops; struct kimage_arch { struct crash_mem *exclude_ranges; #ifdef CONFIG_CRASH_HOTPLUG ? + int fdt_index; + bool fdt_index_valid; #endif It seems that this is only used when CONFIG_CRASH_HOTPLUG is defined, isn't it? Yes it should be kept under CONFIG_CRASH_HOTPLUG config. Thanks, Sourabh Jain unsigned long backup_start; void *backup_buf; void *fdt;
Re: [RFC v4 PATCH 2/5] powerpc/crash hp: introduce a new config option CRASH_HOTPLUG
On 14/04/22 22:10, Laurent Dufour wrote: On 11/04/2022, 10:43:54, Sourabh Jain wrote: The option CRASH_HOTPLUG enables, in kernel update to kexec segments on hotplug events. All the updates needed on the capture kernel load path in the kernel for both kexec_load and kexec_file_load system will be kept under this config. Signed-off-by: Sourabh Jain Reviewed-by: Eric DeVolder Reviewed-by: Laurent Dufour Thanks for the review. - Sourabh Jain --- arch/powerpc/Kconfig | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index b779603978e1..777db33f75b5 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -623,6 +623,17 @@ config FA_DUMP If unsure, say "y". Only special kernels like petitboot may need to say "N" here. +config CRASH_HOTPLUG + bool "kernel updates of crash kexec segments" + depends on CRASH_DUMP && (HOTPLUG_CPU) && KEXEC_FILE + help + An efficient way to keep the capture kernel up-to-date with CPU + hotplug events. On CPU hotplug event the kexec segments of capture + kernel becomes stale and need to be updated with latest CPU data. + In this method the kernel performs minimal update to only relevant + kexec segments on CPU hotplug event, instead of triggering full + capture kernel reload from userspace using udev rule. + config PRESERVE_FA_DUMP bool "Preserve Firmware-assisted dump" depends on PPC64 && PPC_POWERNV && !FA_DUMP
Re: [PATCH] macintosh: macio_asic: fix resource_size.cocci warnings
On Thu, Apr 14, 2022 at 07:02:42AM -0700, Yihao Han wrote: > drivers/macintosh/macio_asic.c:219:26-29: WARNING: > Suspicious code. resource_size is maybe missing with res > drivers/macintosh/macio_asic.c:221:26-29: WARNING: > Suspicious code. resource_size is maybe missing with res For log messages it's ok to overstep the line length limitation for commit logs. IMHO adding newlines is worse, not sure that there are no other strong opinions though. > Use resource_size function on resource object instead of > explicit computation. > > Generated by: scripts/coccinelle/api/resource_size.cocci > > Signed-off-by: Yihao Han > --- > drivers/macintosh/macio_asic.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/macintosh/macio_asic.c b/drivers/macintosh/macio_asic.c > index 1943a007e2d5..260fccb3863e 100644 > --- a/drivers/macintosh/macio_asic.c > +++ b/drivers/macintosh/macio_asic.c > @@ -216,9 +216,9 @@ static int macio_resource_quirks(struct device_node *np, > struct resource *res, > /* Some older IDE resources have bogus sizes */ > if (of_node_name_eq(np, "IDE") || of_node_name_eq(np, "ATA") || > of_node_is_type(np, "ide") || of_node_is_type(np, "ata")) { > - if (index == 0 && (res->end - res->start) > 0xfff) > + if (index == 0 && (resource_size(res)) > 0xfff) You can drop the parenthesis around resource_size(res) here. Other than that looks fine, Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | https://www.pengutronix.de/ | signature.asc Description: PGP signature
Re: [PATCH 2/2] fbdev: Remove hot-unplug workaround for framebuffers without device
Hello Thomas, On 4/13/22 20:09, Thomas Zimmermann wrote: [snip] index bc6ed750e915..bdd00d381bbc 100644 --- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -1579,14 +1579,7 @@ static void do_remove_conflicting_framebuffers(struct apertures_struct *a, * If it's not a platform device, at least print a warning. A * fix would add code to remove the device from the system. */ - if (!device) { - /* TODO: Represent each OF framebuffer as its own - * device in the device hierarchy. For now, offb - * doesn't have such a device, so unregister the - * framebuffer as before without warning. - */ - do_unregister_framebuffer(registered_fb[i]); >>> >>> Maybe we could still keep this for a couple of releases but with a big >>> warning that's not supported in case there are out-of-tree drivers out >>> there that still do this ? >>> >>> Or at least a warning if the do_unregister_framebuffer() call is removed. >> >> Yeah dying while holding console_lock isn't fun, and not having a WARN_ON >> + bail-out code pretty much forces bug reporters to do a bisect here to >> give us something more than "machine dies at boot with no messages". >> >> I'd just outright keep the WARN_ON here for 1-2 years even to really make >> sure we got all the bug reports, since often these older machines only >> update onto LTS releases. > > If that's what the consent is, I'll go with it. > > I'm just not sure if we talk about the same problem. offb didn't have a > platform device, so we recently added this workaround with 'if > (!device)'. All the other fbdev drivers have a platform device; and > anything else that could fail is out-of-tree. We don't really care about > those AFAIK. > Yes, agreed on the offb change but I'm not really sure if we don't care about out-of-tree modules. I mean, you are right in theory but I still feel that we are changing a core behavior without giving people time to sort out if needed. Since before commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") registered FBs didn't need to have a device, but now that will lead to a NULL pointer dereference in dev_is_platform(device). And that change only landed in v5.18-rc1, so it is fairly recent. I know that we follow https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst but still my opinion is that having a warning for a couple of releases if registered_fb[i]->device is NULL, instead of just crashing would be a better way to handle this. > With offb converted, we could practically remove all of the checks here > and call platform_device_unregister() unconditionally. > Yes for mainline, but as mentioned I thought mostly about out-of-tree. If folks agree that we shouldn't care about these, I'm Ok with that as well. -- Best regards, Javier Martinez Canillas Linux Engineering Red Hat
Re: [PATCH net-next v2] net: ethernet: Prepare cleanup of powerpc's asm/prom.h
Hello, Sorry for the late reply. On Fri, 2022-04-15 at 10:39 +0200, Christophe Leroy wrote: > powerpc's asm/prom.h brings some headers that it doesn't > need itself. It's probably my fault, but I really can't parse the above. Could you please re-phrase? > > In order to clean it up in a further step, first clean all > files that include asm/prom.h > > Some files don't need asm/prom.h at all. For those ones, > just remove inclusion of asm/prom.h > > Some files don't need any of the items provided by asm/prom.h, > but need some of the headers included by asm/prom.h. For those > ones, add the needed headers that are brought by asm/prom.h at > the moment, then remove asm/prom.h Do you mean a follow-up patch is needed to drop the asm/prom.h include from such files, even if that include could be dropped now without any fourther change? If so, I suggest v3 should additionally drop the asm/prom.h include where possible. Thanks! Paolo