[PATCH] powerpc/pseries: Fix regression while building external modules
With Commit c9f3401313a5 ("powerpc: Always enable queued spinlocks for 64s, disable for others") CONFIG_PPC_QUEUED_SPINLOCKS is always enabled on ppc64le, external modules that use spinlock APIs are failing. ERROR: modpost: GPL-incompatible module XXX.ko uses GPL-only symbol 'shared_processor' Before the above commit, modules were able to build without any issues. Also this problem is not seen on other architectures. This problem can be workaround if CONFIG_UNINLINE_SPIN_UNLOCK is enabled in the config. However CONFIG_UNINLINE_SPIN_UNLOCK is not enabled by default and only enabled in certain conditions like CONFIG_DEBUG_SPINLOCKS is set in the kernel config. #include spinlock_t spLock; static int __init spinlock_test_init(void) { spin_lock_init(&spLock); spin_lock(&spLock); spin_unlock(&spLock); return 0; } static void __exit spinlock_test_exit(void) { printk("spinlock_test unloaded\n"); } module_init(spinlock_test_init); module_exit(spinlock_test_exit); MODULE_DESCRIPTION ("spinlock_test"); MODULE_LICENSE ("non-GPL"); MODULE_AUTHOR ("Srikar Dronamraju"); Given that spin locks are one of the basic facilities for module code, this effectively makes it impossible to build/load almost any non GPL modules on ppc64le. This was first reported at https://github.com/openzfs/zfs/issues/11172 Currently shared_processor is exported as GPL only symbol. Fix this for parity with other architectures by exposing shared_processor to non-GPL modules too. Fixes: 14c73bd344da ("powerpc/vcpu: Assume dedicated processors as non-preempt") Fixes: c9f3401313a5 ("powerpc: Always enable queued spinlocks for 64s, disable for others") Reported-by: marc.c.dio...@gmail.com Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman Cc: Nicholas Piggin Cc: marc.c.dio...@gmail.com Cc: jfor...@redhat.com Cc: yaday...@in.ibm.com Signed-off-by: Srikar Dronamraju --- arch/powerpc/platforms/pseries/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 754e493b7c05..0338f481c12b 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -77,7 +77,7 @@ #include "../../../../drivers/pci/pci.h" DEFINE_STATIC_KEY_FALSE(shared_processor); -EXPORT_SYMBOL_GPL(shared_processor); +EXPORT_SYMBOL(shared_processor); int CMO_PrPSP = -1; int CMO_SecPSP = -1; base-commit: adf3c31e18b765ea24eba7b0c1efc076b8ee3d55 -- 2.18.2
RE: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)
(My apologies for however IBM's email client munges this) > I heard it is going to be in Go 1.16.7, but I do not know much about Go.> Maybe the folks in Cc can chime in. We have backports primed and ready for the next point release. They are waiting on the release manager to cherrypick them. I think we were aware that our VDSO usage may have exploited some peculiarities in how the ppc64 version was constructed (i.e hand written assembly which just didn't happen to clobber R30). Go up to this point has only used the vdso function __kernel_clock_gettime; it is the only entry point which would need to explicitly avoid R30 for Go's sake. Paul M.
Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
> On 28-Jul-2021, at 11:05 PM, Nathan Chancellor wrote: > > On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote: >> linux-next fails to boot on Power server (POWER8/POWER9). Following traces >> are seen during boot >> >> [0.010799] software IO TLB: tearing down default memory pool >> [0.010805] [ cut here ] >> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! >> [0.010812] Oops: Exception in kernel mode, sig: 5 [#1] ……. > > I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on > commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That Indeed. Thanks Nathan. Bisect points to this commit. Reverting the commit allows the kernel to boot. Thanks -Sachin > series just keeps on giving... Adding some people from that thread to > this one. Original thread: > https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/ > > [1]: > https://github.com/openSUSE/kernel-source/raw/master/config/ppc64le/default > > Cheers, > Nathan
[PATCH v2 2/2] selftests: Skip TM tests on synthetic TM implementations
Transactional Memory was removed from the architecture in ISA v3.1. For threads running in P8/P9 compatibility mode on P10 a synthetic TM implementation is provided. In this implementation, tbegin. always sets cr0 eq meaning the abort handler is always called. This is not an issue as users of TM are expected to have a fallback non transactional way to make forward progress in the abort handler. The TEXASR indicates if a transaction failure is due to a synthetic implementation. Some of the TM self tests need a non-degenerate TM implementation for their testing to be meaningful so check for a synthetic implementation and skip the test if so. Signed-off-by: Jordan Niethe --- v2: Added checking for synthetic implementation to more tests --- .../selftests/powerpc/ptrace/ptrace-tm-gpr.c | 1 + .../powerpc/ptrace/ptrace-tm-spd-gpr.c| 1 + .../powerpc/ptrace/ptrace-tm-spd-tar.c| 1 + .../powerpc/ptrace/ptrace-tm-spd-vsx.c| 1 + .../selftests/powerpc/ptrace/ptrace-tm-spr.c | 1 + .../selftests/powerpc/ptrace/ptrace-tm-tar.c | 1 + .../selftests/powerpc/ptrace/ptrace-tm-vsx.c | 1 + .../selftests/powerpc/signal/signal_tm.c | 1 + tools/testing/selftests/powerpc/tm/tm-exec.c | 1 + tools/testing/selftests/powerpc/tm/tm-fork.c | 1 + .../testing/selftests/powerpc/tm/tm-poison.c | 1 + .../selftests/powerpc/tm/tm-resched-dscr.c| 1 + .../powerpc/tm/tm-signal-context-chk-fpu.c| 1 + .../powerpc/tm/tm-signal-context-chk-gpr.c| 1 + .../powerpc/tm/tm-signal-context-chk-vmx.c| 1 + .../powerpc/tm/tm-signal-context-chk-vsx.c| 1 + .../powerpc/tm/tm-signal-pagefault.c | 1 + .../powerpc/tm/tm-signal-sigreturn-nt.c | 1 + .../selftests/powerpc/tm/tm-signal-stack.c| 1 + .../selftests/powerpc/tm/tm-sigreturn.c | 1 + .../testing/selftests/powerpc/tm/tm-syscall.c | 2 +- tools/testing/selftests/powerpc/tm/tm-tar.c | 1 + tools/testing/selftests/powerpc/tm/tm-tmspr.c | 1 + tools/testing/selftests/powerpc/tm/tm-trap.c | 1 + .../selftests/powerpc/tm/tm-unavailable.c | 1 + .../selftests/powerpc/tm/tm-vmx-unavail.c | 1 + .../testing/selftests/powerpc/tm/tm-vmxcopy.c | 1 + tools/testing/selftests/powerpc/tm/tm.h | 36 +++ 28 files changed, 63 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c index 7df7100a29be..67ca297c5cca 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c @@ -113,6 +113,7 @@ int ptrace_tm_gpr(void) int ret, status; SKIP_IF(!have_htm()); + SKIP_IF(htm_is_synthetic()); shm_id = shmget(IPC_PRIVATE, sizeof(int) * 2, 0777|IPC_CREAT); pid = fork(); if (pid < 0) { diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c index 8706bea5d015..6f2bce1b6c5d 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c @@ -119,6 +119,7 @@ int ptrace_tm_spd_gpr(void) int ret, status; SKIP_IF(!have_htm()); + SKIP_IF(htm_is_synthetic()); shm_id = shmget(IPC_PRIVATE, sizeof(int) * 3, 0777|IPC_CREAT); pid = fork(); if (pid < 0) { diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c index 2ecfa1158e2b..e112a34fbe59 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c @@ -129,6 +129,7 @@ int ptrace_tm_spd_tar(void) int ret, status; SKIP_IF(!have_htm()); + SKIP_IF(htm_is_synthetic()); shm_id = shmget(IPC_PRIVATE, sizeof(int) * 3, 0777|IPC_CREAT); pid = fork(); if (pid == 0) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-vsx.c b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-vsx.c index 6f7fb51f0809..40133d49fe39 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-vsx.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-vsx.c @@ -129,6 +129,7 @@ int ptrace_tm_spd_vsx(void) int ret, status, i; SKIP_IF(!have_htm()); + SKIP_IF(htm_is_synthetic()); shm_id = shmget(IPC_PRIVATE, sizeof(int) * 3, 0777|IPC_CREAT); for (i = 0; i < 128; i++) { diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c index 068bfed2e606..880ba6a29a48 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c @@ -114,6 +114,7 @@ int ptrace_tm_spr(void) int ret, status; SKIP_IF(!have_htm()); + SKIP_IF(htm_is_synthetic()); shm_id =
[PATCH v2 1/2] selftests/powerpc: Add missing clobbered register to to ptrace TM tests
ISA v3.1 removes TM but includes a synthetic implementation for backwards compatibility. With this implementation, the tests ptrace-tm-spd-gpr and ptrace-tm-gpr should never be able to make any forward progress and eventually should be killed by the timeout. Instead on a P10 running in P9 mode, ptrace_tm_gpr fails like so: test: ptrace_tm_gpr tags: git_version:unknown Starting the child ... ... GPR[27]: 1 Expected: 2 GPR[28]: 1 Expected: 2 GPR[29]: 1 Expected: 2 GPR[30]: 1 Expected: 2 GPR[31]: 1 Expected: 2 [FAIL] Test FAILED on line 98 failure: ptrace_tm_gpr selftests: ptrace-tm-gpr [FAIL] The problem is in the inline assembly of the child. r0 is loaded with a value in the child's transaction abort handler but this register is not included in the clobbers list. This means it is possible that this statement: cptr[1] = 0; which is meant to signal the parent to wait may actually use the value placed into r0 by the inline assembly incorrectly signal the parent to continue. By inspection the same problem is present in ptrace-tm-spd-gpr. Adding r0 to the clobbbers list makes the test fail correctly via a timeout on a P10 running in P8/P9 compatibility mode. Suggested-by: Michael Neuling Signed-off-by: Jordan Niethe --- tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c | 2 +- tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c index 82f7bdc2e5e6..7df7100a29be 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c @@ -57,7 +57,7 @@ void tm_gpr(void) : [gpr_1]"i"(GPR_1), [gpr_2]"i"(GPR_2), [sprn_texasr] "i" (SPRN_TEXASR), [flt_1] "b" (&a), [flt_2] "b" (&b), [cptr1] "b" (&cptr[1]) - : "memory", "r7", "r8", "r9", "r10", + : "memory", "r0", "r7", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27", "r28", diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c index ad65be6e8e85..8706bea5d015 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c @@ -65,7 +65,7 @@ void tm_spd_gpr(void) : [gpr_1]"i"(GPR_1), [gpr_2]"i"(GPR_2), [gpr_4]"i"(GPR_4), [sprn_texasr] "i" (SPRN_TEXASR), [flt_1] "b" (&a), [flt_4] "b" (&d) - : "memory", "r5", "r6", "r7", + : "memory", "r0", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31" -- 2.25.1
Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
Excerpts from Nathan Chancellor's message of July 29, 2021 3:35 am: > On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote: >> linux-next fails to boot on Power server (POWER8/POWER9). Following traces >> are seen during boot >> >> [0.010799] software IO TLB: tearing down default memory pool >> [0.010805] [ cut here ] >> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! >> [0.010812] Oops: Exception in kernel mode, sig: 5 [#1] >> [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries >> [0.010820] Modules linked in: >> [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted >> 5.14.0-rc3-next-20210727 #1 >> [0.010830] NIP: c0032cfc LR: c000c764 CTR: >> c000c670 >> [0.010834] REGS: c3603b10 TRAP: 0700 Not tainted >> (5.14.0-rc3-next-20210727) >> [0.010838] MSR: 80029033 CR: 28000222 >> XER: 0002 >> [0.010848] CFAR: c000c760 IRQMASK: 3 >> [0.010848] GPR00: c000c764 c3603db0 c29bd000 >> 0001 >> [0.010848] GPR04: 0a68 0400 c3603868 >> >> [0.010848] GPR08: >> 0003 >> [0.010848] GPR12: c0001ec9ee80 c0012a28 >> >> [0.010848] GPR16: >> >> [0.010848] GPR20: >> >> [0.010848] GPR24: f134 >> c3603868 >> [0.010848] GPR28: 0400 0a68 c202e9c0 >> c3603e80 >> [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0 >> [0.010901] LR [c000c764] system_call_common+0xf4/0x258 >> [0.010907] Call Trace: >> [0.010909] [c3603db0] [c016a6dc] >> calculate_sigpending+0x4c/0xe0 (unreliable) >> [0.010915] [c3603e10] [c000c764] >> system_call_common+0xf4/0x258 >> [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8 >> [0.010926] NIP: c0092dec LR: c0114fc8 CTR: >> >> [0.010930] REGS: c3603e80 TRAP: 0c00 Not tainted >> (5.14.0-rc3-next-20210727) >> [0.010934] MSR: 80009033 CR: 28000222 >> XER: >> [0.010943] IRQMASK: 0 >> [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 >> f134 >> [0.010943] GPR04: 0a68 0400 c3603868 >> >> [0.010943] GPR08: >> >> [0.010943] GPR12: c0001ec9ee80 c0012a28 >> >> [0.010943] GPR16: >> >> [0.010943] GPR20: >> >> [0.010943] GPR24: c20033c4 c110afc0 c2081950 >> c3277d40 >> [0.010943] GPR28: ca68 0400 >> 000d >> [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8 >> [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60 >> [0.010999] --- interrupt: c00 >> [0.011001] [c3603b00] [c000c764] >> system_call_common+0xf4/0x258 (unreliable) >> [0.011008] Instruction dump: >> [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 >> e87f0108 68690002 >> [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 >> 792907e0 0b09 >> [0.011029] ---[ end trace a20ad55589efcb10 ]--- >> [0.012297] >> [1.012304] Kernel panic - not syncing: Fatal exception >> >> next-20210723 was good. The boot failure seems to have been introduced with >> next-20210726. >> >> I have attached the boot log. > > I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on > commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That > series just keeps on giving... Adding some people from that thread to > this one. Original thread: > https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/ This is because powerpc's set_memory_encrypted makes an ultracall but it does not exist on that processor. x86's set_memory_encrypted/decrypted have /* Nothing to do if memory encryption is not active */ if (!mem_encrypt_active()) return 0; Probably powerpc should just do that too. Thanks, Nick
Re: [PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()
On Thu, 29 Jul 2021 10:00:51 +0800 Kefeng Wang wrote: > On 2021/7/28 23:28, Steven Rostedt wrote: > > On Wed, 28 Jul 2021 16:13:18 +0800 > > Kefeng Wang wrote: > > > >> The is_kernel[_text]() function check the address whether or not > >> in kernel[_text] ranges, also they will check the address whether > >> or not in gate area, so use better name. > > Do you know what a gate area is? > > > > Because I believe gate area is kernel text, so the rename just makes it > > redundant and more confusing. > > Yes, the gate area(eg, vectors part on ARM32, similar on x86/ia64) is > kernel text. > > I want to keep the 'basic' section boundaries check, which only check > the start/end > > of sections, all in section.h, could we use 'generic' or 'basic' or > 'core' in the naming? > > * is_kernel_generic_data() --- come from core_kernel_data() in kernel.h > * is_kernel_generic_text() > > The old helper could remain unchanged, any suggestion, thanks. Because it looks like the check of just being in the range of "_stext" to "_end" is just an internal helper, why not do what we do all over the kernel, and just prefix the function with a couple of underscores, that denote that it's internal? __is_kernel_text() Then you have: static inline int is_kernel_text(unsigned long addr) { if (__is_kernel_text(addr)) return 1; return in_gate_area_no_mm(addr); } -- Steve
Re: [PATCH v5 2/2] KVM: PPC: Book3S HV: Stop forwarding all HFUs to L1
Excerpts from Fabiano Rosas's message of July 28, 2021 12:36 am: > Nicholas Piggin writes: > >> Excerpts from Fabiano Rosas's message of July 27, 2021 6:17 am: >>> If the nested hypervisor has no access to a facility because it has >>> been disabled by the host, it should also not be able to see the >>> Hypervisor Facility Unavailable that arises from one of its guests >>> trying to access the facility. >>> >>> This patch turns a HFU that happened in L2 into a Hypervisor Emulation >>> Assistance interrupt and forwards it to L1 for handling. The ones that >>> happened because L1 explicitly disabled the facility for L2 are still >>> let through, along with the corresponding Cause bits in the HFSCR. >>> >>> Signed-off-by: Fabiano Rosas >>> Reviewed-by: Nicholas Piggin >>> --- >>> arch/powerpc/kvm/book3s_hv_nested.c | 32 +++-- >>> 1 file changed, 26 insertions(+), 6 deletions(-) >>> >>> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c >>> b/arch/powerpc/kvm/book3s_hv_nested.c >>> index 8215dbd4be9a..d544b092b49a 100644 >>> --- a/arch/powerpc/kvm/book3s_hv_nested.c >>> +++ b/arch/powerpc/kvm/book3s_hv_nested.c >>> @@ -99,7 +99,7 @@ static void byteswap_hv_regs(struct hv_guest_state *hr) >>> hr->dawrx1 = swab64(hr->dawrx1); >>> } >>> >>> -static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap, >>> +static void save_hv_return_state(struct kvm_vcpu *vcpu, >>> struct hv_guest_state *hr) >>> { >>> struct kvmppc_vcore *vc = vcpu->arch.vcore; >>> @@ -118,7 +118,7 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, >>> int trap, >>> hr->pidr = vcpu->arch.pid; >>> hr->cfar = vcpu->arch.cfar; >>> hr->ppr = vcpu->arch.ppr; >>> - switch (trap) { >>> + switch (vcpu->arch.trap) { >>> case BOOK3S_INTERRUPT_H_DATA_STORAGE: >>> hr->hdar = vcpu->arch.fault_dar; >>> hr->hdsisr = vcpu->arch.fault_dsisr; >>> @@ -128,9 +128,29 @@ static void save_hv_return_state(struct kvm_vcpu >>> *vcpu, int trap, >>> hr->asdr = vcpu->arch.fault_gpa; >>> break; >>> case BOOK3S_INTERRUPT_H_FAC_UNAVAIL: >>> - hr->hfscr = ((~HFSCR_INTR_CAUSE & hr->hfscr) | >>> -(HFSCR_INTR_CAUSE & vcpu->arch.hfscr)); >>> - break; >>> + { >>> + u8 cause = vcpu->arch.hfscr >> 56; >> >> Can this be u64 just to help gcc? >> > > Yes. > >>> + >>> + WARN_ON_ONCE(cause >= BITS_PER_LONG); >>> + >>> + if (!(hr->hfscr & (1UL << cause))) { >>> + hr->hfscr = ((~HFSCR_INTR_CAUSE & hr->hfscr) | >>> +(HFSCR_INTR_CAUSE & vcpu->arch.hfscr)); >>> + break; >>> + } >>> + >>> + /* >>> +* We have disabled this facility, so it does not >>> +* exist from L1's perspective. Turn it into a HEAI. >>> +*/ >>> + vcpu->arch.trap = BOOK3S_INTERRUPT_H_EMUL_ASSIST; >>> + kvmppc_load_last_inst(vcpu, INST_GENERIC, >>> &vcpu->arch.emul_inst); >> >> Hmm, this doesn't handle kvmpc_load_last_inst failure. Other code tends >> to just resume guest and retry in this case. Can we do that here? >> > > Not at this point. The other code does that inside > kvmppc_handle_exit_hv, which is called from kvmhv_run_single_vcpu. And > since we're changing the interrupt, I cannot load the last instruction > at kvmppc_handle_nested_exit because at that point this is still an HFU. > > Unless I do it anyway at the HFU handler and put a comment explaining > the situation. Yeah I think it would be better to move this logic to the nested exit handler. Thanks, Nick
Re: [PATCH] ibmvfc: fix command state accounting and stale response detection
On Fri, 16 Jul 2021 14:52:20 -0600, Tyrel Datwyler wrote: > Prior to commit 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside > the host/queue lock") responses to commands were completed sequentially > with the host lock held such that a command had a basic binary state of > active or free. It was therefore a simple affair of ensuring the > assocaiated ibmvfc_event to a VIOS response was valid by testing that it > was not already free. The lock relexation work to complete commands > outside the lock inadverdently made it a trinary command state such that > a command is either in flight, received and being completed, or > completed and now free. This breaks the stale command detection logic as > a command may be still marked active and been placed on the delayed > completion list when a second stale response for the same command > arrives. This can lead to double completions and list corruption. This > issue was exposed by a recent VIOS regression were a missing memory > barrier could occasionally result in the ibmvfc client receiveing a > duplicate response for the same command. > > [...] Applied to 5.14/scsi-fixes, thanks! [1/1] ibmvfc: fix command state accounting and stale response detection https://git.kernel.org/mkp/scsi/c/73bfdf707d01 -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2 2/7] kallsyms: Fix address-checks for kernel related range
On 2021/7/28 22:46, Steven Rostedt wrote: On Wed, 28 Jul 2021 16:13:15 +0800 Kefeng Wang wrote: The is_kernel_inittext/is_kernel_text/is_kernel function should not include the end address(the labels _einittext, _etext and _end) when check the address range, the issue exists since Linux v2.6.12. Cc: Arnd Bergmann Cc: Sergey Senozhatsky Cc: Petr Mladek Acked-by: Sergey Senozhatsky Reviewed-by: Petr Mladek Signed-off-by: Kefeng Wang Reviewed-by: Steven Rostedt (VMware) Thanks. -- Steve
Re: [PATCH v2 6/7] sections: Add new is_kernel() and is_kernel_text()
On 2021/7/28 23:32, Steven Rostedt wrote: On Wed, 28 Jul 2021 16:13:19 +0800 Kefeng Wang wrote: @@ -64,8 +64,7 @@ const struct exception_table_entry *search_exception_tables(unsigned long addr) int notrace core_kernel_text(unsigned long addr) { - if (addr >= (unsigned long)_stext && - addr < (unsigned long)_etext) + if (is_kernel_text(addr)) Perhaps this was a bug, and these functions should be checking the gate area as well, as that is part of kernel text. Ok, I would fix this if patch5 is reviewed well. -- Steve return 1; if (system_state < SYSTEM_RUNNING && diff --git a/mm/kasan/report.c b/mm/kasan/report.c index 884a950c7026..88f5b0c058b7 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -235,7 +235,7 @@ static void describe_object(struct kmem_cache *cache, void *object, static inline bool kernel_or_module_addr(const void *addr) { - if (addr >= (void *)_stext && addr < (void *)_end) + if (is_kernel((unsigned long)addr)) return true; if (is_module_address((unsigned long)addr)) return true; -- .
Re: [PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()
On 2021/7/28 23:28, Steven Rostedt wrote: On Wed, 28 Jul 2021 16:13:18 +0800 Kefeng Wang wrote: The is_kernel[_text]() function check the address whether or not in kernel[_text] ranges, also they will check the address whether or not in gate area, so use better name. Do you know what a gate area is? Because I believe gate area is kernel text, so the rename just makes it redundant and more confusing. Yes, the gate area(eg, vectors part on ARM32, similar on x86/ia64) is kernel text. I want to keep the 'basic' section boundaries check, which only check the start/end of sections, all in section.h, could we use 'generic' or 'basic' or 'core' in the naming? * is_kernel_generic_data() --- come from core_kernel_data() in kernel.h * is_kernel_generic_text() The old helper could remain unchanged, any suggestion, thanks. -- Steve .
Re: [PATCH] arch: Kconfig: clean up obsolete use of HAVE_IDE
On 7/28/21 12:21 PM, Lukas Bulwahn wrote: > The arch-specific Kconfig files use HAVE_IDE to indicate if IDE is > supported. > > As IDE support and the HAVE_IDE config vanishes with commit b7fb14d3ac63 > ("ide: remove the legacy ide driver"), there is no need to mention > HAVE_IDE in all those arch-specific Kconfig files. > > The issue was identified with ./scripts/checkkconfigsymbols.py. Thanks, let's queue this for 5.14 to avoid any future conflicts with it. -- Jens Axboe
Re: [PATCH] arch: Kconfig: clean up obsolete use of HAVE_IDE
On 7/28/21 11:21 AM, Lukas Bulwahn wrote: > The arch-specific Kconfig files use HAVE_IDE to indicate if IDE is > supported. > > As IDE support and the HAVE_IDE config vanishes with commit b7fb14d3ac63 > ("ide: remove the legacy ide driver"), there is no need to mention > HAVE_IDE in all those arch-specific Kconfig files. > > The issue was identified with ./scripts/checkkconfigsymbols.py. > > Fixes: b7fb14d3ac63 ("ide: remove the legacy ide driver") > Suggested-by: Randy Dunlap > Signed-off-by: Lukas Bulwahn Acked-by: Randy Dunlap Thanks. > --- > arch/alpha/Kconfig| 1 - > arch/arm/Kconfig | 6 -- > arch/arm/mach-davinci/Kconfig | 1 - > arch/h8300/Kconfig.cpu| 1 - > arch/ia64/Kconfig | 1 - > arch/m68k/Kconfig | 1 - > arch/mips/Kconfig | 1 - > arch/parisc/Kconfig | 1 - > arch/powerpc/Kconfig | 1 - > arch/sh/Kconfig | 1 - > arch/sparc/Kconfig| 1 - > arch/x86/Kconfig | 1 - > arch/xtensa/Kconfig | 1 - > 13 files changed, 18 deletions(-) > -- ~Randy
[PATCH] arch: Kconfig: clean up obsolete use of HAVE_IDE
The arch-specific Kconfig files use HAVE_IDE to indicate if IDE is supported. As IDE support and the HAVE_IDE config vanishes with commit b7fb14d3ac63 ("ide: remove the legacy ide driver"), there is no need to mention HAVE_IDE in all those arch-specific Kconfig files. The issue was identified with ./scripts/checkkconfigsymbols.py. Fixes: b7fb14d3ac63 ("ide: remove the legacy ide driver") Suggested-by: Randy Dunlap Signed-off-by: Lukas Bulwahn --- arch/alpha/Kconfig| 1 - arch/arm/Kconfig | 6 -- arch/arm/mach-davinci/Kconfig | 1 - arch/h8300/Kconfig.cpu| 1 - arch/ia64/Kconfig | 1 - arch/m68k/Kconfig | 1 - arch/mips/Kconfig | 1 - arch/parisc/Kconfig | 1 - arch/powerpc/Kconfig | 1 - arch/sh/Kconfig | 1 - arch/sparc/Kconfig| 1 - arch/x86/Kconfig | 1 - arch/xtensa/Kconfig | 1 - 13 files changed, 18 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 77d3280dc678..a6d4c2f744e3 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -14,7 +14,6 @@ config ALPHA select PCI_SYSCALL if PCI select HAVE_AOUT select HAVE_ASM_MODVERSIONS - select HAVE_IDE select HAVE_PCSPKR_PLATFORM select HAVE_PERF_EVENTS select NEED_DMA_MAP_STATE diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 82f908fa5676..2fb7012c3246 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -95,7 +95,6 @@ config ARM select HAVE_FUNCTION_TRACER if !XIP_KERNEL select HAVE_GCC_PLUGINS select HAVE_HW_BREAKPOINT if PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7) - select HAVE_IDE if PCI || ISA || PCMCIA select HAVE_IRQ_TIME_ACCOUNTING select HAVE_KERNEL_GZIP select HAVE_KERNEL_LZ4 @@ -361,7 +360,6 @@ config ARCH_FOOTBRIDGE bool "FootBridge" select CPU_SA110 select FOOTBRIDGE - select HAVE_IDE select NEED_MACH_IO_H if !MMU select NEED_MACH_MEMORY_H help @@ -430,7 +428,6 @@ config ARCH_PXA select GENERIC_IRQ_MULTI_HANDLER select GPIO_PXA select GPIOLIB - select HAVE_IDE select IRQ_DOMAIN select PLAT_PXA select SPARSE_IRQ @@ -446,7 +443,6 @@ config ARCH_RPC select ARM_HAS_SG_CHAIN select CPU_SA110 select FIQ - select HAVE_IDE select HAVE_PATA_PLATFORM select ISA_DMA_API select LEGACY_TIMER_TICK @@ -469,7 +465,6 @@ config ARCH_SA1100 select CPU_SA1100 select GENERIC_IRQ_MULTI_HANDLER select GPIOLIB - select HAVE_IDE select IRQ_DOMAIN select ISA select NEED_MACH_MEMORY_H @@ -505,7 +500,6 @@ config ARCH_OMAP1 select GENERIC_IRQ_CHIP select GENERIC_IRQ_MULTI_HANDLER select GPIOLIB - select HAVE_IDE select HAVE_LEGACY_CLK select IRQ_DOMAIN select NEED_MACH_IO_H if PCCARD diff --git a/arch/arm/mach-davinci/Kconfig b/arch/arm/mach-davinci/Kconfig index de11030748d0..1d3aef84287d 100644 --- a/arch/arm/mach-davinci/Kconfig +++ b/arch/arm/mach-davinci/Kconfig @@ -9,7 +9,6 @@ menuconfig ARCH_DAVINCI select PM_GENERIC_DOMAINS_OF if PM && OF select REGMAP_MMIO select RESET_CONTROLLER - select HAVE_IDE select PINCTRL_SINGLE if ARCH_DAVINCI diff --git a/arch/h8300/Kconfig.cpu b/arch/h8300/Kconfig.cpu index 2b9cbaf41cd0..e4467d40107d 100644 --- a/arch/h8300/Kconfig.cpu +++ b/arch/h8300/Kconfig.cpu @@ -44,7 +44,6 @@ config H8300_H8MAX bool "H8MAX" select H83069 select RAMKERNEL - select HAVE_IDE help H8MAX Evaluation Board Support More Information. (Japanese Only) diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index cf425c2c63af..4993c7ac7ff6 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -25,7 +25,6 @@ config IA64 select HAVE_ASM_MODVERSIONS select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_EXIT_THREAD - select HAVE_IDE select HAVE_KPROBES select HAVE_KRETPROBES select HAVE_FTRACE_MCOUNT_RECORD diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 96989ad46f66..d632a1d576f9 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -23,7 +23,6 @@ config M68K select HAVE_DEBUG_BUGVERBOSE select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_HAS_NO_UNALIGNED select HAVE_FUTEX_CMPXCHG if MMU && FUTEX - select HAVE_IDE select HAVE_MOD_ARCH_SPECIFIC select HAVE_UID16 select MMU_GATHER_NO_RANGE if MMU diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index cee6087cd686..6dfb27d531dd 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -71,7 +71,6 @@ config MIPS select HAVE_FUNCTION_TRACER select HAVE_GCC_PLUGINS select HAVE_GENERIC_VDSO - select HAVE_IDE select HAVE_IOREMAP
[PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings
On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus in thread-group share both L2 and L3 caches. Hence, use cache_property = 2 itself to find both the L2 and L3 cache siblings. Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings, but fill the mask using same property "2" array. Signed-off-by: Parth Shah --- arch/powerpc/include/asm/smp.h | 3 ++ arch/powerpc/kernel/cacheinfo.c | 3 ++ arch/powerpc/kernel/smp.c | 66 ++--- 3 files changed, 51 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 1259040cc3a4..7ef1cd8168a0 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -35,6 +35,7 @@ extern int *chip_id_lookup_table; DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); +DECLARE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map); #ifdef CONFIG_SMP @@ -144,6 +145,7 @@ extern int cpu_to_core_id(int cpu); extern bool has_big_cores; extern bool thread_group_shares_l2; +extern bool thread_group_shares_l3; #define cpu_smt_mask cpu_smt_mask #ifdef CONFIG_SCHED_SMT @@ -198,6 +200,7 @@ extern void __cpu_die(unsigned int cpu); #define hard_smp_processor_id()get_hard_smp_processor_id(0) #define smp_setup_cpu_maps() #define thread_group_shares_l2 0 +#define thread_group_shares_l3 0 static inline void inhibit_secondary_onlining(void) {} static inline void uninhibit_secondary_onlining(void) {} static inline const struct cpumask *cpu_sibling_mask(int cpu) diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c index 20d91693eac1..cf1be75b7833 100644 --- a/arch/powerpc/kernel/cacheinfo.c +++ b/arch/powerpc/kernel/cacheinfo.c @@ -469,6 +469,9 @@ static int get_group_id(unsigned int cpu_id, int level) else if (thread_group_shares_l2 && level == 2) return cpumask_first(per_cpu(thread_group_l2_cache_map, cpu_id)); + else if (thread_group_shares_l3 && level == 3) + return cpumask_first(per_cpu(thread_group_l3_cache_map, +cpu_id)); return -1; } diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index a7fcac44a8e2..f2abd88e0c25 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -78,6 +78,7 @@ struct task_struct *secondary_current; bool has_big_cores; bool coregroup_enabled; bool thread_group_shares_l2; +bool thread_group_shares_l3; DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map); DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map); @@ -101,7 +102,7 @@ enum { #define MAX_THREAD_LIST_SIZE 8 #define THREAD_GROUP_SHARE_L1 1 -#define THREAD_GROUP_SHARE_L2 2 +#define THREAD_GROUP_SHARE_L2_L3 2 struct thread_groups { unsigned int property; unsigned int nr_groups; @@ -131,6 +132,12 @@ DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); */ DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); +/* + * On P10, thread_group_l3_cache_map for each CPU is equal to the + * thread_group_l2_cache_map + */ +DEFINE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map); + /* SMP operations for this machine */ struct smp_ops_t *smp_ops; @@ -889,19 +896,41 @@ static struct thread_groups *__init get_thread_groups(int cpu, return tg; } +static int update_mask_from_threadgroup(cpumask_var_t *mask, struct thread_groups *tg, int cpu, int cpu_group_start) +{ + int first_thread = cpu_first_thread_sibling(cpu); + int i; + + zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); + + for (i = first_thread; i < first_thread + threads_per_core; i++) { + int i_group_start = get_cpu_thread_group_start(i, tg); + + if (unlikely(i_group_start == -1)) { + WARN_ON_ONCE(1); + return -ENODATA; + } + + if (i_group_start == cpu_group_start) + cpumask_set_cpu(i, *mask); + } + + return 0; +} + static int __init init_thread_group_cache_map(int cpu, int cache_property) { - int first_thread = cpu_first_thread_sibling(cpu); - int i, cpu_group_start = -1, err = 0; + int cpu_group_start = -1, err = 0; struct thread_groups *tg = NULL; cpumask_var_t *mask = NULL; if (cache_property != THREAD_GROUP_SHARE_L1 && - cache_property != THREAD_GROUP_SHARE_L2) + cache_property != THREAD_GROUP_SHARE_L2_L3) return -EINVAL; tg = get_thread_groups(cpu, cache_property, &err); + if (!tg) return err; @@ -912,25 +941,18 @@ static int __init init_thread_group_cache_map(int cpu, int cache_property) return -ENODATA; } - if (cache_property == THREAD_GROUP
[PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map()
From: "Gautham R. Shenoy" The helper function get_shared_cpu_map() was added in 'commit 500fe5f550ec ("powerpc/cacheinfo: Report the correct shared_cpu_map on big-cores")' and subsequently expanded upon in 'commit 0be47634db0b ("powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache")' in order to help report the correct groups of threads sharing these caches on big-core systems where groups of threads within a core can share different sets of caches. Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property, cache->shared_cpu_map contains the correct set of thread-siblings sharing the cache. Hence we no longer need the functions get_shared_cpu_map(). This patch removes this function. We also remove the helper function index_dir_to_cpu() which was only called by get_shared_cpu_map(). With these functions removed, we can still see the correct cache-sibling map/list for L1 and L2 caches on systems with L1 and L2 caches distributed among groups of threads in a core. With this patch, on a SMT8 POWER10 system where the L1 and L2 caches are split between the two groups of threads in a core, for CPUs 8,9, the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as follows: $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15 $ ppc64_cpu --smt=4 $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11 $ ppc64_cpu --smt=2 $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9 $ ppc64_cpu --smt=1 $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8 Signed-off-by: Gautham R. Shenoy --- arch/powerpc/kernel/cacheinfo.c | 41 + 1 file changed, 1 insertion(+), 40 deletions(-) diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c index 5a6925d87424..20d91693eac1 100644 --- a/arch/powerpc/kernel/cacheinfo.c +++ b/arch/powerpc/kernel/cacheinfo.c @@ -675,45 +675,6 @@ static ssize_t level_show(struct kobject *k, struct kobj_attribute *attr, char * static struct kobj_attribute cache_level_attr = __ATTR(level, 0444, level_show, NULL); -static unsigned int index_dir_to_cpu(struct cache_index_dir *index) -{ - struct kobject *index_dir_kobj = &index->kobj; - struct kobject *cache_dir_kobj = index_dir_kobj->parent; - struct kobject *cpu_dev_kobj = cache_dir_kobj->parent; - struct device *dev = kobj_to_dev(cpu_dev_kobj); - - return dev->id; -} - -/* - * On big-core systems, each core has two groups of CPUs each of which - * has its own L1-cache. The thread-siblings which share l1-cache with - * @cpu can be obtained via cpu_smallcore_mask(). - * - * On some big-core systems, the L2 cache is shared only between some - * groups of siblings. This is already parsed and encoded in - * cpu_l2_cache_mask(). - * - * TODO: cache_lookup_or_instantiate() needs to be made aware of the - * "ibm,thread-groups" property so that cache->shared_cpu_map - * reflects the correct siblings on platforms that have this - * device-tree property. This helper function is only a stop-gap - * solution so that we report the correct siblings to the - * userspace via sysfs. - */
[PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id
From: "Gautham R. Shenoy" Currently the cacheinfo code on powerpc indexes the "cache" objects (modelling the L1/L2/L3 caches) where the key is device-tree node corresponding to that cache. On some of the POWER server platforms thread-groups within the core share different sets of caches (Eg: On SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and threads 1,3,5,7 of the same core share another L1 cache). On such platforms, there is a single device-tree node corresponding to that cache and the cache-configuration within the threads of the core is indicated via "ibm,thread-groups" device-tree property. Since the current code is not aware of the "ibm,thread-groups" property, on the aforementoined systems, cacheinfo code still treats all the threads in the core to be sharing the cache because of the single device-tree node (In the earlier example, the cacheinfo code would says CPUs 0-7 share L1 cache). In this patch, we make the powerpc cacheinfo code aware of the "ibm,thread-groups" property. We indexe the "cache" objects by the key-pair (device-tree node, thread-group id). For any CPUX, for a given level of cache, the thread-group id is defined to be the first CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels of cache which are not represented in "ibm,thread-groups" property, the thread-group id is -1. Signed-off-by: Gautham R. Shenoy [parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map" and "thread_group_l2_cache_map" to get rid of the compile error.] Signed-off-by: Parth Shah --- arch/powerpc/include/asm/smp.h | 3 ++ arch/powerpc/kernel/cacheinfo.c | 80 - arch/powerpc/kernel/smp.c | 4 +- 3 files changed, 63 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 03b3d010cbab..1259040cc3a4 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -33,6 +33,9 @@ extern bool coregroup_enabled; extern int cpu_to_chip_id(int cpu); extern int *chip_id_lookup_table; +DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); +DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); + #ifdef CONFIG_SMP struct smp_ops_t { diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c index 6f903e9aa20b..5a6925d87424 100644 --- a/arch/powerpc/kernel/cacheinfo.c +++ b/arch/powerpc/kernel/cacheinfo.c @@ -120,6 +120,7 @@ struct cache { struct cpumask shared_cpu_map; /* online CPUs using this cache */ int type; /* split cache disambiguation */ int level; /* level not explicit in device tree */ + int group_id; /* id of the group of threads that share this cache */ struct list_head list; /* global list of cache objects */ struct cache *next_local; /* next cache of >= level */ }; @@ -142,22 +143,24 @@ static const char *cache_type_string(const struct cache *cache) } static void cache_init(struct cache *cache, int type, int level, - struct device_node *ofnode) + struct device_node *ofnode, int group_id) { cache->type = type; cache->level = level; cache->ofnode = of_node_get(ofnode); + cache->group_id = group_id; INIT_LIST_HEAD(&cache->list); list_add(&cache->list, &cache_list); } -static struct cache *new_cache(int type, int level, struct device_node *ofnode) +static struct cache *new_cache(int type, int level, + struct device_node *ofnode, int group_id) { struct cache *cache; cache = kzalloc(sizeof(*cache), GFP_KERNEL); if (cache) - cache_init(cache, type, level, ofnode); + cache_init(cache, type, level, ofnode, group_id); return cache; } @@ -309,20 +312,24 @@ static struct cache *cache_find_first_sibling(struct cache *cache) return cache; list_for_each_entry(iter, &cache_list, list) - if (iter->ofnode == cache->ofnode && iter->next_local == cache) + if (iter->ofnode == cache->ofnode && + iter->group_id == cache->group_id && + iter->next_local == cache) return iter; return cache; } -/* return the first cache on a local list matching node */ -static struct cache *cache_lookup_by_node(const struct device_node *node) +/* return the first cache on a local list matching node and thread-group id */ +static struct cache *cache_lookup_by_node_group(const struct device_node *node, + int group_id) { struct cache *cache = NULL; struct cache *iter; list_for_each_entry(iter, &cache_list, list) { - if (iter->ofnode != node) + if (iter->ofnode != node || + iter->group_i
[PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property
Changes from v1 -> v2: - Based on Gautham's comments, use a separate thread_group_l3_cache_map and modify parsing code to build cache_map for L3. This makes the cache_map building code isolated from the parsing code. v1 can be found at: https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-June/230680.html On POWER10 big-core system, the L3 cache reflected by sysfs contains all the CPUs in the big-core. grep . /sys/devices/system/cpu/cpu0/cache/index*/shared_cpu_list /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_list:0,2,4,6 /sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_list:0,2,4,6 /sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6 /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list:0-7 In the above example, CPU-0 observes CPU 0-7 in L3 (index3) cache, which is not correct as only the CPUs in small core share the L3 cache. The "ibm,thread-groups" contains property "2" to indicate that the CPUs share both the L2 and L3 caches. This patch-set uses this property to reflect correct L3 topology to a cache-object. After applying this patch-set, the topology looks like: $> ppc64_cpu --smt=8 $> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9,11,13,15 $> ppc64_cpu --smt=4 $> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9,11 $> ppc64_cpu --smt=2 $> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9 $> ppc64_cpu --smt=1 grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8 Patches Organization: = This patch-set series is based on top of v5.14-rc2 - Patch 1-2: Add functionality to introduce awareness for "ibm,thread-groups". Original (not merged) posted version can be found at: https://lore.kernel.org/linuxppc-dev/1611041780-8640-1-git-send-email-...@linux.vnet.ibm.co - Patch 3: Use existing L2 cache_map to detect L3 cache siblings Gautham R. Shenoy (2): powerpc/cacheinfo: Lookup cache by dt node and thread-group id powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() Parth Shah (1): powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings arch/powerpc/include/asm/smp.h | 6 ++ arch/powerpc/kernel/cacheinfo.c | 124 arch/powerpc/kernel/smp.c | 70 -- 3 files changed, 115 insertions(+), 85 deletions(-) -- 2.26.3
Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote: > linux-next fails to boot on Power server (POWER8/POWER9). Following traces > are seen during boot > > [0.010799] software IO TLB: tearing down default memory pool > [0.010805] [ cut here ] > [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! > [0.010812] Oops: Exception in kernel mode, sig: 5 [#1] > [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > [0.010820] Modules linked in: > [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted > 5.14.0-rc3-next-20210727 #1 > [0.010830] NIP: c0032cfc LR: c000c764 CTR: > c000c670 > [0.010834] REGS: c3603b10 TRAP: 0700 Not tainted > (5.14.0-rc3-next-20210727) > [0.010838] MSR: 80029033 CR: 28000222 > XER: 0002 > [0.010848] CFAR: c000c760 IRQMASK: 3 > [0.010848] GPR00: c000c764 c3603db0 c29bd000 > 0001 > [0.010848] GPR04: 0a68 0400 c3603868 > > [0.010848] GPR08: > 0003 > [0.010848] GPR12: c0001ec9ee80 c0012a28 > > [0.010848] GPR16: > > [0.010848] GPR20: > > [0.010848] GPR24: f134 > c3603868 > [0.010848] GPR28: 0400 0a68 c202e9c0 > c3603e80 > [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0 > [0.010901] LR [c000c764] system_call_common+0xf4/0x258 > [0.010907] Call Trace: > [0.010909] [c3603db0] [c016a6dc] > calculate_sigpending+0x4c/0xe0 (unreliable) > [0.010915] [c3603e10] [c000c764] > system_call_common+0xf4/0x258 > [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8 > [0.010926] NIP: c0092dec LR: c0114fc8 CTR: > > [0.010930] REGS: c3603e80 TRAP: 0c00 Not tainted > (5.14.0-rc3-next-20210727) > [0.010934] MSR: 80009033 CR: 28000222 > XER: > [0.010943] IRQMASK: 0 > [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 > f134 > [0.010943] GPR04: 0a68 0400 c3603868 > > [0.010943] GPR08: > > [0.010943] GPR12: c0001ec9ee80 c0012a28 > > [0.010943] GPR16: > > [0.010943] GPR20: > > [0.010943] GPR24: c20033c4 c110afc0 c2081950 > c3277d40 > [0.010943] GPR28: ca68 0400 > 000d > [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8 > [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60 > [0.010999] --- interrupt: c00 > [0.011001] [c3603b00] [c000c764] > system_call_common+0xf4/0x258 (unreliable) > [0.011008] Instruction dump: > [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 e87f0108 > 68690002 > [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 > 792907e0 0b09 > [0.011029] ---[ end trace a20ad55589efcb10 ]--- > [0.012297] > [1.012304] Kernel panic - not syncing: Fatal exception > > next-20210723 was good. The boot failure seems to have been introduced with > next-20210726. > > I have attached the boot log. I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That series just keeps on giving... Adding some people from that thread to this one. Original thread: https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/ [1]: https://github.com/openSUSE/kernel-source/raw/master/config/ppc64le/default Cheers, Nathan
Re: [PATCH v5 1/6] kexec: move locking into do_kexec_load
Arnd Bergmann writes: > From: Arnd Bergmann > > The locking is the same between the native and compat version of > sys_kexec_load(), so it can be done in the common implementation > to reduce duplication. Acked-by: "Eric W. Biederman" > > Co-developed-by: Eric Biederman > Co-developed-by: Christoph Hellwig > Signed-off-by: Arnd Bergmann > --- > kernel/kexec.c | 44 > 1 file changed, 16 insertions(+), 28 deletions(-) > > diff --git a/kernel/kexec.c b/kernel/kexec.c > index c82c6c06f051..9c7aef8f4bb6 100644 > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -110,6 +110,17 @@ static int do_kexec_load(unsigned long entry, unsigned > long nr_segments, > unsigned long i; > int ret; > > + /* > + * Because we write directly to the reserved memory region when loading > + * crash kernels we need a mutex here to prevent multiple crash kernels > + * from attempting to load simultaneously, and to prevent a crash kernel > + * from loading over the top of a in use crash kernel. > + * > + * KISS: always take the mutex. > + */ > + if (!mutex_trylock(&kexec_mutex)) > + return -EBUSY; > + > if (flags & KEXEC_ON_CRASH) { > dest_image = &kexec_crash_image; > if (kexec_crash_image) > @@ -121,7 +132,8 @@ static int do_kexec_load(unsigned long entry, unsigned > long nr_segments, > if (nr_segments == 0) { > /* Uninstall image */ > kimage_free(xchg(dest_image, NULL)); > - return 0; > + ret = 0; > + goto out_unlock; > } > if (flags & KEXEC_ON_CRASH) { > /* > @@ -134,7 +146,7 @@ static int do_kexec_load(unsigned long entry, unsigned > long nr_segments, > > ret = kimage_alloc_init(&image, entry, nr_segments, segments, flags); > if (ret) > - return ret; > + goto out_unlock; > > if (flags & KEXEC_PRESERVE_CONTEXT) > image->preserve_context = 1; > @@ -171,6 +183,8 @@ static int do_kexec_load(unsigned long entry, unsigned > long nr_segments, > arch_kexec_protect_crashkres(); > > kimage_free(image); > +out_unlock: > + mutex_unlock(&kexec_mutex); > return ret; > } > > @@ -247,21 +261,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, > unsigned long, nr_segments, > ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT)) > return -EINVAL; > > - /* Because we write directly to the reserved memory > - * region when loading crash kernels we need a mutex here to > - * prevent multiple crash kernels from attempting to load > - * simultaneously, and to prevent a crash kernel from loading > - * over the top of a in use crash kernel. > - * > - * KISS: always take the mutex. > - */ > - if (!mutex_trylock(&kexec_mutex)) > - return -EBUSY; > - > result = do_kexec_load(entry, nr_segments, segments, flags); > > - mutex_unlock(&kexec_mutex); > - > return result; > } > > @@ -301,21 +302,8 @@ COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry, > return -EFAULT; > } > > - /* Because we write directly to the reserved memory > - * region when loading crash kernels we need a mutex here to > - * prevent multiple crash kernels from attempting to load > - * simultaneously, and to prevent a crash kernel from loading > - * over the top of a in use crash kernel. > - * > - * KISS: always take the mutex. > - */ > - if (!mutex_trylock(&kexec_mutex)) > - return -EBUSY; > - > result = do_kexec_load(entry, nr_segments, ksegments, flags); > > - mutex_unlock(&kexec_mutex); > - > return result; > } > #endif
Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features
On Wed, Jul 28, 2021 at 02:17:27PM +0100, Christoph Hellwig wrote: > So common checks obviously make sense, but I really hate the stupid > multiplexer. Having one well-documented helper per feature is much > easier to follow. We had that in x86 - it was called cpu_has_ where xxx is the feature bit. It didn't scale with the sheer amount of feature bits that kept getting added so we do cpu_feature_enabled(X86_FEATURE_XXX) now. The idea behind this is very similar - those protected guest flags will only grow in the couple of tens range - at least - so having a multiplexer is a lot simpler, I'd say, than having a couple of tens of helpers. And those PATTR flags should have good, readable names, btw. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v5 2/6] kexec: avoid compat_alloc_user_space
Arnd Bergmann writes: > From: Arnd Bergmann > > kimage_alloc_init() expects a __user pointer, so compat_sys_kexec_load() > uses compat_alloc_user_space() to convert the layout and put it back > onto the user space caller stack. > > Moving the user space access into the syscall handler directly actually > makes the code simpler, as the conversion for compat mode can now be > done on kernel memory. Acked-by: "Eric W. Biederman" > > Co-developed-by: Eric Biederman > Co-developed-by: Christoph Hellwig > Link: https://lore.kernel.org/lkml/ypbtsu4gx6pl7%2...@infradead.org/ > Link: https://lore.kernel.org/lkml/m1y2cbzmnw@fess.ebiederm.org/ > Signed-off-by: Arnd Bergmann > --- > kernel/kexec.c | 61 +- > 1 file changed, 25 insertions(+), 36 deletions(-) > > diff --git a/kernel/kexec.c b/kernel/kexec.c > index 9c7aef8f4bb6..b5e40f069768 100644 > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -19,26 +19,9 @@ > > #include "kexec_internal.h" > > -static int copy_user_segment_list(struct kimage *image, > - unsigned long nr_segments, > - struct kexec_segment __user *segments) > -{ > - int ret; > - size_t segment_bytes; > - > - /* Read in the segments */ > - image->nr_segments = nr_segments; > - segment_bytes = nr_segments * sizeof(*segments); > - ret = copy_from_user(image->segment, segments, segment_bytes); > - if (ret) > - ret = -EFAULT; > - > - return ret; > -} > - > static int kimage_alloc_init(struct kimage **rimage, unsigned long entry, >unsigned long nr_segments, > - struct kexec_segment __user *segments, > + struct kexec_segment *segments, >unsigned long flags) > { > int ret; > @@ -58,10 +41,8 @@ static int kimage_alloc_init(struct kimage **rimage, > unsigned long entry, > return -ENOMEM; > > image->start = entry; > - > - ret = copy_user_segment_list(image, nr_segments, segments); > - if (ret) > - goto out_free_image; > + image->nr_segments = nr_segments; > + memcpy(image->segment, segments, nr_segments * sizeof(*segments)); > > if (kexec_on_panic) { > /* Enable special crash kernel control page alloc policy. */ > @@ -104,7 +85,7 @@ static int kimage_alloc_init(struct kimage **rimage, > unsigned long entry, > } > > static int do_kexec_load(unsigned long entry, unsigned long nr_segments, > - struct kexec_segment __user *segments, unsigned long flags) > + struct kexec_segment *segments, unsigned long flags) > { > struct kimage **dest_image, *image; > unsigned long i; > @@ -250,7 +231,8 @@ static inline int kexec_load_check(unsigned long > nr_segments, > SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments, > struct kexec_segment __user *, segments, unsigned long, flags) > { > - int result; > + struct kexec_segment *ksegments; > + unsigned long result; > > result = kexec_load_check(nr_segments, flags); > if (result) > @@ -261,7 +243,12 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, > unsigned long, nr_segments, > ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT)) > return -EINVAL; > > - result = do_kexec_load(entry, nr_segments, segments, flags); > + ksegments = memdup_user(segments, nr_segments * sizeof(ksegments[0])); > + if (IS_ERR(ksegments)) > + return PTR_ERR(ksegments); > + > + result = do_kexec_load(entry, nr_segments, ksegments, flags); > + kfree(ksegments); > > return result; > } > @@ -273,7 +260,7 @@ COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry, > compat_ulong_t, flags) > { > struct compat_kexec_segment in; > - struct kexec_segment out, __user *ksegments; > + struct kexec_segment *ksegments; > unsigned long i, result; > > result = kexec_load_check(nr_segments, flags); > @@ -286,24 +273,26 @@ COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, > entry, > if ((flags & KEXEC_ARCH_MASK) == KEXEC_ARCH_DEFAULT) > return -EINVAL; > > - ksegments = compat_alloc_user_space(nr_segments * sizeof(out)); > + ksegments = kmalloc_array(nr_segments, sizeof(ksegments[0]), > + GFP_KERNEL); > + if (!ksegments) > + return -ENOMEM; > + > for (i = 0; i < nr_segments; i++) { > result = copy_from_user(&in, &segments[i], sizeof(in)); > if (result) > - return -EFAULT; > + goto fail; > > - out.buf = compat_ptr(in.buf); > - out.bufsz = in.bufsz; > - out.mem = in.mem; > - out.memsz = in.memsz; > - > - result = copy_to_user(&ksegments
Re: [PATCH v2 6/7] sections: Add new is_kernel() and is_kernel_text()
On Wed, 28 Jul 2021 16:13:19 +0800 Kefeng Wang wrote: > @@ -64,8 +64,7 @@ const struct exception_table_entry > *search_exception_tables(unsigned long addr) > > int notrace core_kernel_text(unsigned long addr) > { > - if (addr >= (unsigned long)_stext && > - addr < (unsigned long)_etext) > + if (is_kernel_text(addr)) Perhaps this was a bug, and these functions should be checking the gate area as well, as that is part of kernel text. -- Steve > return 1; > > if (system_state < SYSTEM_RUNNING && > diff --git a/mm/kasan/report.c b/mm/kasan/report.c > index 884a950c7026..88f5b0c058b7 100644 > --- a/mm/kasan/report.c > +++ b/mm/kasan/report.c > @@ -235,7 +235,7 @@ static void describe_object(struct kmem_cache *cache, > void *object, > > static inline bool kernel_or_module_addr(const void *addr) > { > - if (addr >= (void *)_stext && addr < (void *)_end) > + if (is_kernel((unsigned long)addr)) > return true; > if (is_module_address((unsigned long)addr)) > return true; > --
Re: [PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()
On Wed, 28 Jul 2021 16:13:18 +0800 Kefeng Wang wrote: > The is_kernel[_text]() function check the address whether or not > in kernel[_text] ranges, also they will check the address whether > or not in gate area, so use better name. Do you know what a gate area is? Because I believe gate area is kernel text, so the rename just makes it redundant and more confusing. -- Steve
Re: [PATCH v2 2/7] kallsyms: Fix address-checks for kernel related range
On Wed, 28 Jul 2021 16:13:15 +0800 Kefeng Wang wrote: > The is_kernel_inittext/is_kernel_text/is_kernel function should not > include the end address(the labels _einittext, _etext and _end) when > check the address range, the issue exists since Linux v2.6.12. > > Cc: Arnd Bergmann > Cc: Sergey Senozhatsky > Cc: Petr Mladek > Acked-by: Sergey Senozhatsky > Reviewed-by: Petr Mladek > Signed-off-by: Kefeng Wang Reviewed-by: Steven Rostedt (VMware) -- Steve > --- > include/linux/kallsyms.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h > index 2a241e3f063f..b016c62f30a6 100644 > --- a/include/linux/kallsyms.h > +++ b/include/linux/kallsyms.h > @@ -27,21 +27,21 @@ struct module; > static inline int is_kernel_inittext(unsigned long addr) > { > if (addr >= (unsigned long)_sinittext > - && addr <= (unsigned long)_einittext) > + && addr < (unsigned long)_einittext) > return 1; > return 0; > } > > static inline int is_kernel_text(unsigned long addr) > { > - if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext)) > + if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext)) > return 1; > return in_gate_area_no_mm(addr); > } > > static inline int is_kernel(unsigned long addr) > { > - if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end) > + if (addr >= (unsigned long)_stext && addr < (unsigned long)_end) > return 1; > return in_gate_area_no_mm(addr); > }
Re: [PATCH 02/11] x86/sev: Add an x86 version of prot_guest_has()
On Tue, Jul 27, 2021 at 05:26:05PM -0500, Tom Lendacky via iommu wrote: > Introduce an x86 version of the prot_guest_has() function. This will be > used in the more generic x86 code to replace vendor specific calls like > sev_active(), etc. > > While the name suggests this is intended mainly for guests, it will > also be used for host memory encryption checks in place of sme_active(). > > The amd_prot_guest_has() function does not use EXPORT_SYMBOL_GPL for the > same reasons previously stated when changing sme_active(), sev_active and None of that applies here as none of the callers get pulled into random macros. The only case of that is sme_me_mask through sme_mask, but that's not something this series replaces as far as I can tell.
Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features
On Tue, Jul 27, 2021 at 05:26:04PM -0500, Tom Lendacky via iommu wrote: > In prep for other protected virtualization technologies, introduce a > generic helper function, prot_guest_has(), that can be used to check > for specific protection attributes, like memory encryption. This is > intended to eliminate having to add multiple technology-specific checks > to the code (e.g. if (sev_active() || tdx_active())). So common checks obviously make sense, but I really hate the stupid multiplexer. Having one well-documented helper per feature is much easier to follow. > +#define PATTR_MEM_ENCRYPT0 /* Encrypted memory */ > +#define PATTR_HOST_MEM_ENCRYPT 1 /* Host encrypted > memory */ > +#define PATTR_GUEST_MEM_ENCRYPT 2 /* Guest encrypted > memory */ > +#define PATTR_GUEST_PROT_STATE 3 /* Guest encrypted > state */ The kerneldoc comments on these individual helpers will give you plenty of space to properly document what they indicate and what a (potential) caller should do based on them. Something the above comments completely fail to.
Re: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)
Dear Michael, Am 28.07.21 um 14:43 schrieb Michael Ellerman: Paul Menzel writes: Am 28.07.21 um 01:14 schrieb Benjamin Herrenschmidt: On Tue, 2021-07-27 at 10:45 +0200, Paul Menzel wrote: On ppc64le Go 1.16.2 from Ubuntu 21.04 terminates with a segmentation fault [1], and it might be related to *[release-branch.go1.16] runtime: fix crash during VDSO calls on PowerPC* [2], conjecturing that commit ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.) added in Linux 5.11 causes this. If this is indeed the case, this would be a regression in userspace. Is there a generic fix or should the change be reverted? From the look at the links you posted, this appears to be completely broken assumptions by Go that some registers don't change while calling what essentially are external library functions *while inside those functions* (ie in this case from a signal handler). I suppose it would be possible to build the VDSO with gcc arguments to make it not use r30, but that's just gross... Thank you for looking into this. No idea, if it falls under Linux’ no regression policy or not. Reluctantly yes, I think it does. Though it would have been good if it had been reported to us sooner. It looks like that Go fix is only committed to master, and neither of the latest Go 1.16 or 1.15 releases contain the fix? ie. there's no way for a user to get a working version of Go other than building master? I heard it is going to be in Go 1.16.7, but I do not know much about Go. Maybe the folks in Cc can chime in. I'll see if we can work around it in the kernel. Are you able to test a kernel patch if I send you one? Yes, I could test a Linux kernel patch on ppc64le (POWER 8) running Ubuntu 21.04. Kind regards, Paul
Re: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)
Paul Menzel writes: > Am 28.07.21 um 01:14 schrieb Benjamin Herrenschmidt: >> On Tue, 2021-07-27 at 10:45 +0200, Paul Menzel wrote: > >>> On ppc64le Go 1.16.2 from Ubuntu 21.04 terminates with a segmentation >>> fault [1], and it might be related to *[release-branch.go1.16] runtime: >>> fix crash during VDSO calls on PowerPC* [2], conjecturing that commit >>> ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.) >>> added in Linux 5.11 causes this. >>> >>> If this is indeed the case, this would be a regression in userspace. Is >>> there a generic fix or should the change be reverted? >> >> From the look at the links you posted, this appears to be completely >> broken assumptions by Go that some registers don't change while calling >> what essentially are external library functions *while inside those >> functions* (ie in this case from a signal handler). >> >> I suppose it would be possible to build the VDSO with gcc arguments to >> make it not use r30, but that's just gross... > > Thank you for looking into this. No idea, if it falls under Linux’ no > regression policy or not. Reluctantly yes, I think it does. Though it would have been good if it had been reported to us sooner. It looks like that Go fix is only committed to master, and neither of the latest Go 1.16 or 1.15 releases contain the fix? ie. there's no way for a user to get a working version of Go other than building master? I'll see if we can work around it in the kernel. Are you able to test a kernel patch if I send you one? cheers
[PATCH v2 0/1] cpufreq:powernv: Fix init_chip_info initialization in numa=off
v1: https://lkml.org/lkml/2021/7/26/1509 Changelog v1-->v2: Based on comments from Gautham, 1. Included a #define for MAX_NR_CHIPS instead of hardcoding the allocation. Pratik R. Sampat (1): cpufreq:powernv: Fix init_chip_info initialization in numa=off drivers/cpufreq/powernv-cpufreq.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) -- 2.31.1
[PATCH v2 1/1] cpufreq:powernv: Fix init_chip_info initialization in numa=off
In the numa=off kernel command-line configuration init_chip_info() loops around the number of chips and attempts to copy the cpumask of that node which is NULL for all iterations after the first chip. Hence, store the cpu mask for each chip instead of derving cpumask from node while populating the "chips" struct array and copy that to the chips[i].mask Cc: sta...@vger.kernel.org Fixes: 053819e0bf84 ("cpufreq: powernv: Handle throttling due to Pmax capping at chip level") Signed-off-by: Pratik R. Sampat Reported-by: Shirisha Ganta Reviewed-by: Gautham R. Shenoy --- drivers/cpufreq/powernv-cpufreq.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index 005600cef273..5f0e7c315e49 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -36,6 +36,7 @@ #define MAX_PSTATE_SHIFT 32 #define LPSTATE_SHIFT 48 #define GPSTATE_SHIFT 56 +#define MAX_NR_CHIPS 32 #define MAX_RAMP_DOWN_TIME 5120 /* @@ -1046,12 +1047,20 @@ static int init_chip_info(void) unsigned int *chip; unsigned int cpu, i; unsigned int prev_chip_id = UINT_MAX; + cpumask_t *chip_cpu_mask; int ret = 0; chip = kcalloc(num_possible_cpus(), sizeof(*chip), GFP_KERNEL); if (!chip) return -ENOMEM; + /* Allocate a chip cpu mask large enough to fit mask for all chips */ + chip_cpu_mask = kcalloc(MAX_NR_CHIPS, sizeof(cpumask_t), GFP_KERNEL); + if (!chip_cpu_mask) { + ret = -ENOMEM; + goto free_and_return; + } + for_each_possible_cpu(cpu) { unsigned int id = cpu_to_chip_id(cpu); @@ -1059,22 +1068,25 @@ static int init_chip_info(void) prev_chip_id = id; chip[nr_chips++] = id; } + cpumask_set_cpu(cpu, &chip_cpu_mask[nr_chips-1]); } chips = kcalloc(nr_chips, sizeof(struct chip), GFP_KERNEL); if (!chips) { ret = -ENOMEM; - goto free_and_return; + goto out_chip_cpu_mask; } for (i = 0; i < nr_chips; i++) { chips[i].id = chip[i]; - cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i])); + cpumask_copy(&chips[i].mask, &chip_cpu_mask[i]); INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn); for_each_cpu(cpu, &chips[i].mask) per_cpu(chip_info, cpu) = &chips[i]; } +out_chip_cpu_mask: + kfree(chip_cpu_mask); free_and_return: kfree(chip); return ret; -- 2.31.1
Re: [PATCH 00/11] Implement generic prot_guest_has() helper function
Am 28.07.21 um 00:26 schrieb Tom Lendacky: This patch series provides a generic helper function, prot_guest_has(), to replace the sme_active(), sev_active(), sev_es_active() and mem_encrypt_active() functions. It is expected that as new protected virtualization technologies are added to the kernel, they can all be covered by a single function call instead of a collection of specific function calls all called from the same locations. The powerpc and s390 patches have been compile tested only. Can the folks copied on this series verify that nothing breaks for them. As GPU driver dev I'm only one end user of this, but at least from the high level point of view that makes totally sense to me. Feel free to add an Acked-by: Christian König . We could run that through the AMD GPU unit tests, but I fear we actually don't test on a system with SEV/SME active. Going to raise that on our team call today. Regards, Christian. Cc: Andi Kleen Cc: Andy Lutomirski Cc: Ard Biesheuvel Cc: Baoquan He Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Christian Borntraeger Cc: Daniel Vetter Cc: Dave Hansen Cc: Dave Young Cc: David Airlie Cc: Heiko Carstens Cc: Ingo Molnar Cc: Joerg Roedel Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Michael Ellerman Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Thomas Zimmermann Cc: Vasily Gorbik Cc: VMware Graphics Cc: Will Deacon --- Patches based on: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master commit 79e920060fa7 ("Merge branch 'WIP/fixes'") Tom Lendacky (11): mm: Introduce a function to check for virtualization protection features x86/sev: Add an x86 version of prot_guest_has() powerpc/pseries/svm: Add a powerpc version of prot_guest_has() x86/sme: Replace occurrences of sme_active() with prot_guest_has() x86/sev: Replace occurrences of sev_active() with prot_guest_has() x86/sev: Replace occurrences of sev_es_active() with prot_guest_has() treewide: Replace the use of mem_encrypt_active() with prot_guest_has() mm: Remove the now unused mem_encrypt_active() function x86/sev: Remove the now unused mem_encrypt_active() function powerpc/pseries/svm: Remove the now unused mem_encrypt_active() function s390/mm: Remove the now unused mem_encrypt_active() function arch/Kconfig | 3 ++ arch/powerpc/include/asm/mem_encrypt.h | 5 -- arch/powerpc/include/asm/protected_guest.h | 30 +++ arch/powerpc/platforms/pseries/Kconfig | 1 + arch/s390/include/asm/mem_encrypt.h| 2 - arch/x86/Kconfig | 1 + arch/x86/include/asm/kexec.h | 2 +- arch/x86/include/asm/mem_encrypt.h | 13 + arch/x86/include/asm/protected_guest.h | 27 ++ arch/x86/kernel/crash_dump_64.c| 4 +- arch/x86/kernel/head64.c | 4 +- arch/x86/kernel/kvm.c | 3 +- arch/x86/kernel/kvmclock.c | 4 +- arch/x86/kernel/machine_kexec_64.c | 19 +++ arch/x86/kernel/pci-swiotlb.c | 9 ++-- arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/kernel/sev.c | 6 +-- arch/x86/kvm/svm/svm.c | 3 +- arch/x86/mm/ioremap.c | 16 +++--- arch/x86/mm/mem_encrypt.c | 60 +++--- arch/x86/mm/mem_encrypt_identity.c | 3 +- arch/x86/mm/pat/set_memory.c | 3 +- arch/x86/platform/efi/efi_64.c | 9 ++-- arch/x86/realmode/init.c | 8 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 4 +- drivers/gpu/drm/drm_cache.c| 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c| 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_msg.c| 6 +-- drivers/iommu/amd/init.c | 7 +-- drivers/iommu/amd/iommu.c | 3 +- drivers/iommu/amd/iommu_v2.c | 3 +- drivers/iommu/iommu.c | 3 +- fs/proc/vmcore.c | 6 +-- include/linux/mem_encrypt.h| 4 -- include/linux/protected_guest.h| 37 + kernel/dma/swiotlb.c | 4 +- 36 files changed, 218 insertions(+), 104 deletions(-) create mode 100644 arch/powerpc/include/asm/protected_guest.h create mode 100644 arch/x86/include/asm/protected_guest.h create mode 100644 include/linux/protected_guest.h
Re: [PATCH] virtio-console: avoid DMA from vmalloc area
在 2021/7/28 下午5:01, Arnd Bergmann 写道: On Wed, Jul 28, 2021 at 10:28 AM Xianting Tian wrote: 在 2021/7/28 下午3:25, Arnd Bergmann 写道: I checked several hvc backends, like drivers/tty/hvc/hvc_riscv_sbi.c, drivers/tty/hvc/hvc_iucv.c, drivers/tty/hvc/hvc_rtas.c, they don't use dma. I not finished all hvc backends check yet. But I think even if all hvc backends don't use dma currently, it is still possible that the hvc backend using dma will be added in the furture. So I agree with you it should better be fixed in the hvc framework, solve the issue in the first place. Ok, sounds good to me, no need to check more backends then. I see the hvc-console driver is listed as 'Odd Fixes' in the maintainer list, with nobody assigned other than the ppc kernel list (added to Cc). Once you come up with a fix in hvc_console.c, please send that to the tty maintainers, the ppc list and me, and I'll review it. OK, thanks, I will submit the patch ASAP :) Arnd
Re: [PATCH] virtio-console: avoid DMA from vmalloc area
On Wed, Jul 28, 2021 at 10:28 AM Xianting Tian wrote: > 在 2021/7/28 下午3:25, Arnd Bergmann 写道: > > I checked several hvc backends, like drivers/tty/hvc/hvc_riscv_sbi.c, > drivers/tty/hvc/hvc_iucv.c, drivers/tty/hvc/hvc_rtas.c, they don't use dma. > > I not finished all hvc backends check yet. But I think even if all hvc > backends don't use dma currently, it is still possible that the hvc > backend using dma will be added in the furture. > > So I agree with you it should better be fixed in the hvc framework, > solve the issue in the first place. Ok, sounds good to me, no need to check more backends then. I see the hvc-console driver is listed as 'Odd Fixes' in the maintainer list, with nobody assigned other than the ppc kernel list (added to Cc). Once you come up with a fix in hvc_console.c, please send that to the tty maintainers, the ppc list and me, and I'll review it. Arnd
Re: Possible regression by ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.)
Dear Benjamin, Am 28.07.21 um 01:14 schrieb Benjamin Herrenschmidt: On Tue, 2021-07-27 at 10:45 +0200, Paul Menzel wrote: On ppc64le Go 1.16.2 from Ubuntu 21.04 terminates with a segmentation fault [1], and it might be related to *[release-branch.go1.16] runtime: fix crash during VDSO calls on PowerPC* [2], conjecturing that commit ab037dd87a2f (powerpc/vdso: Switch VDSO to generic C implementation.) added in Linux 5.11 causes this. If this is indeed the case, this would be a regression in userspace. Is there a generic fix or should the change be reverted? From the look at the links you posted, this appears to be completely broken assumptions by Go that some registers don't change while calling what essentially are external library functions *while inside those functions* (ie in this case from a signal handler). I suppose it would be possible to build the VDSO with gcc arguments to make it not use r30, but that's just gross... Thank you for looking into this. No idea, if it falls under Linux’ no regression policy or not. Kind regards, Paul
[PATCH] mm/pkeys: Remove unused parameter in arch_set_user_pkey_access
The arch_set_user_pkey_access function never uses its first parameter (struct task_struct *tsk). It is only able to set the pkey permissions for the current task as implemented, and existing kernel code only passes "current" to arch_set_user_pkey_access. So remove the ambiguous parameter to make the code clean. Signed-off-by: Jiashuo Liang --- arch/powerpc/include/asm/pkeys.h | 8 +++- arch/powerpc/mm/book3s64/pkeys.c | 3 +-- arch/x86/include/asm/pkeys.h | 12 arch/x86/kernel/fpu/xstate.c | 3 +-- arch/x86/mm/pkeys.c | 3 +-- include/linux/pkeys.h| 3 +-- mm/mprotect.c| 2 +- 7 files changed, 12 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 59a2c7dbc78f..e905b2ab31e2 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -143,10 +143,8 @@ static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma, return __arch_override_mprotect_pkey(vma, prot, pkey); } -extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey, - unsigned long init_val); -static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, - unsigned long init_val) +extern int __arch_set_user_pkey_access(int pkey, unsigned long init_val); +static inline int arch_set_user_pkey_access(int pkey, unsigned long init_val) { if (!mmu_has_feature(MMU_FTR_PKEY)) return -EINVAL; @@ -160,7 +158,7 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, if (pkey == 0) return init_val ? -EINVAL : 0; - return __arch_set_user_pkey_access(tsk, pkey, init_val); + return __arch_set_user_pkey_access(pkey, init_val); } static inline bool arch_pkeys_enabled(void) diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c index a2d9ad138709..dc77c0a27291 100644 --- a/arch/powerpc/mm/book3s64/pkeys.c +++ b/arch/powerpc/mm/book3s64/pkeys.c @@ -333,8 +333,7 @@ static inline void init_iamr(int pkey, u8 init_bits) * Set the access rights in AMR IAMR and UAMOR registers for @pkey to that * specified in @init_val. */ -int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey, - unsigned long init_val) +int __arch_set_user_pkey_access(int pkey, unsigned long init_val) { u64 new_amr_bits = 0x0ul; u64 new_iamr_bits = 0x0ul; diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 5c7bcaa79623..26d872bdee49 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -11,8 +11,7 @@ */ #define arch_max_pkey() (cpu_feature_enabled(X86_FEATURE_OSPKE) ? 16 : 1) -extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, - unsigned long init_val); +extern int arch_set_user_pkey_access(int pkey, unsigned long init_val); static inline bool arch_pkeys_enabled(void) { @@ -43,8 +42,7 @@ static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma, return __arch_override_mprotect_pkey(vma, prot, pkey); } -extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey, - unsigned long init_val); +extern int __arch_set_user_pkey_access(int pkey, unsigned long init_val); #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | VM_PKEY_BIT3) @@ -120,10 +118,8 @@ int mm_pkey_free(struct mm_struct *mm, int pkey) return 0; } -extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, - unsigned long init_val); -extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey, - unsigned long init_val); +extern int arch_set_user_pkey_access(int pkey, unsigned long init_val); +extern int __arch_set_user_pkey_access(int pkey, unsigned long init_val); static inline int vma_pkey(struct vm_area_struct *vma) { diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index c8def1b7f8fb..565de4a49c0a 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -912,8 +912,7 @@ EXPORT_SYMBOL_GPL(get_xsave_addr); * This will go out and modify PKRU register to set the access * rights for @pkey to @init_val. */ -int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, - unsigned long init_val) +int arch_set_user_pkey_access(int pkey, unsigned long init_val) { u32 old_pkru, new_pkru_bits = 0; int pkey_shift; diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index e44e938885b7..fafc10ea7cf1 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -42,8 +42,7 @@ int __execute_only_pkey(struct mm_struct *mm) * Set up PKRU so that it denies access for everything * other than execution.
[PATCH v2 6/7] sections: Add new is_kernel() and is_kernel_text()
The new is_kernel() check the kernel address ranges, and the new is_kernel_text() check the kernel text section ranges. Then use them to make some code clear. Cc: Arnd Bergmann Cc: Andrey Ryabinin Signed-off-by: Kefeng Wang --- include/asm-generic/sections.h | 27 +++ include/linux/kallsyms.h | 4 ++-- kernel/extable.c | 3 +-- mm/kasan/report.c | 2 +- 4 files changed, 31 insertions(+), 5 deletions(-) diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 4f2f32aa2b7a..6b143637ab88 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -170,6 +170,20 @@ static inline bool is_kernel_rodata(unsigned long addr) addr < (unsigned long)__end_rodata; } +/** + * is_kernel_text - checks if the pointer address is located in the + * .text section + * + * @addr: address to check + * + * Returns: true if the address is located in .text, false otherwise. + */ +static inline bool is_kernel_text(unsigned long addr) +{ + return addr >= (unsigned long)_stext && + addr < (unsigned long)_etext; +} + /** * is_kernel_inittext - checks if the pointer address is located in the *.init.text section @@ -184,4 +198,17 @@ static inline bool is_kernel_inittext(unsigned long addr) addr < (unsigned long)_einittext; } +/** + * is_kernel - checks if the pointer address is located in the kernel range + * + * @addr: address to check + * + * Returns: true if the address is located in kernel range, false otherwise. + */ +static inline bool is_kernel(unsigned long addr) +{ + return addr >= (unsigned long)_stext && + addr < (unsigned long)_end; +} + #endif /* _ASM_GENERIC_SECTIONS_H_ */ diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 4f501ac9c2c2..897d5720884f 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -26,14 +26,14 @@ struct module; static inline int is_kernel_text_or_gate_area(unsigned long addr) { - if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext)) + if (is_kernel_text(addr)) return 1; return in_gate_area_no_mm(addr); } static inline int is_kernel_or_gate_area(unsigned long addr) { - if (addr >= (unsigned long)_stext && addr < (unsigned long)_end) + if (is_kernel(addr)) return 1; return in_gate_area_no_mm(addr); } diff --git a/kernel/extable.c b/kernel/extable.c index 98ca627ac5ef..0ba383d850ff 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -64,8 +64,7 @@ const struct exception_table_entry *search_exception_tables(unsigned long addr) int notrace core_kernel_text(unsigned long addr) { - if (addr >= (unsigned long)_stext && - addr < (unsigned long)_etext) + if (is_kernel_text(addr)) return 1; if (system_state < SYSTEM_RUNNING && diff --git a/mm/kasan/report.c b/mm/kasan/report.c index 884a950c7026..88f5b0c058b7 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -235,7 +235,7 @@ static void describe_object(struct kmem_cache *cache, void *object, static inline bool kernel_or_module_addr(const void *addr) { - if (addr >= (void *)_stext && addr < (void *)_end) + if (is_kernel((unsigned long)addr)) return true; if (is_module_address((unsigned long)addr)) return true; -- 2.26.2
[PATCH v2 3/7] sections: Move and rename core_kernel_data() to is_kernel_core_data()
Move core_kernel_data() into sections.h and rename it to is_kernel_core_data(), also make it return bool value, then update all the callers. Cc: Arnd Bergmann Cc: Steven Rostedt Cc: Ingo Molnar Cc: "David S. Miller" Signed-off-by: Kefeng Wang --- include/asm-generic/sections.h | 14 ++ include/linux/kernel.h | 1 - kernel/extable.c | 18 -- kernel/trace/ftrace.c | 2 +- net/sysctl_net.c | 2 +- 5 files changed, 16 insertions(+), 21 deletions(-) diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 817309e289db..26ed9fc9b4e3 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -142,6 +142,20 @@ static inline bool init_section_intersects(void *virt, size_t size) return memory_intersects(__init_begin, __init_end, virt, size); } +/** + * is_kernel_core_data - checks if the pointer address is located in the + * .data section + * + * @addr: address to check + * + * Returns: true if the address is located in .data, false otherwise. + */ +static inline bool is_kernel_core_data(unsigned long addr) +{ + return addr >= (unsigned long)_sdata && + addr < (unsigned long)_edata; +} + /** * is_kernel_rodata - checks if the pointer address is located in the *.rodata section diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 1b2f0a7e00d6..0622418bafbc 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -230,7 +230,6 @@ extern char *next_arg(char *args, char **param, char **val); extern int core_kernel_text(unsigned long addr); extern int init_kernel_text(unsigned long addr); -extern int core_kernel_data(unsigned long addr); extern int __kernel_text_address(unsigned long addr); extern int kernel_text_address(unsigned long addr); extern int func_ptr_is_kernel_text(void *ptr); diff --git a/kernel/extable.c b/kernel/extable.c index b0ea5eb0c3b4..da26203841d4 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -82,24 +82,6 @@ int notrace core_kernel_text(unsigned long addr) return 0; } -/** - * core_kernel_data - tell if addr points to kernel data - * @addr: address to test - * - * Returns true if @addr passed in is from the core kernel data - * section. - * - * Note: On some archs it may return true for core RODATA, and false - * for others. But will always be true for core RW data. - */ -int core_kernel_data(unsigned long addr) -{ - if (addr >= (unsigned long)_sdata && - addr < (unsigned long)_edata) - return 1; - return 0; -} - int __kernel_text_address(unsigned long addr) { if (kernel_text_address(addr)) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index e6fb3e6e1ffc..d01ca1cb2d5f 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -323,7 +323,7 @@ int __register_ftrace_function(struct ftrace_ops *ops) if (!ftrace_enabled && (ops->flags & FTRACE_OPS_FL_PERMANENT)) return -EBUSY; - if (!core_kernel_data((unsigned long)ops)) + if (!is_kernel_core_data((unsigned long)ops)) ops->flags |= FTRACE_OPS_FL_DYNAMIC; add_ftrace_ops(&ftrace_ops_list, ops); diff --git a/net/sysctl_net.c b/net/sysctl_net.c index f6cb0d4d114c..4b45ed631eb8 100644 --- a/net/sysctl_net.c +++ b/net/sysctl_net.c @@ -144,7 +144,7 @@ static void ensure_safe_net_sysctl(struct net *net, const char *path, addr = (unsigned long)ent->data; if (is_module_address(addr)) where = "module"; - else if (core_kernel_data(addr)) + else if (is_kernel_core_data(addr)) where = "kernel"; else continue; -- 2.26.2
[PATCH v2 7/7] powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper
Use is_kernel_text() and is_kernel_inittext() helper to simplify code, also drop etext, _stext, _sinittext, _einittext declaration which already declared in section.h. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Kefeng Wang --- arch/powerpc/mm/pgtable_32.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index dcf5ecca19d9..13c798308c2e 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -33,8 +33,6 @@ #include -extern char etext[], _stext[], _sinittext[], _einittext[]; - static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data; notrace void __init early_ioremap_init(void) @@ -104,14 +102,13 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top) { unsigned long v, s; phys_addr_t p; - int ktext; + bool ktext; s = offset; v = PAGE_OFFSET + s; p = memstart_addr + s; for (; s < top; s += PAGE_SIZE) { - ktext = ((char *)v >= _stext && (char *)v < etext) || - ((char *)v >= _sinittext && (char *)v < _einittext); + ktext = (is_kernel_text(v) || is_kernel_inittext(v)); map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL); v += PAGE_SIZE; p += PAGE_SIZE; -- 2.26.2
[PATCH v2 4/7] sections: Move is_kernel_inittext() into sections.h
The is_kernel_inittext() and init_kernel_text() are with same functionality, let's just keep is_kernel_inittext() and move it into sections.h, then update all the callers. Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Arnd Bergmann Cc: x...@kernel.org Signed-off-by: Kefeng Wang --- arch/x86/kernel/unwind_orc.c | 2 +- include/asm-generic/sections.h | 14 ++ include/linux/kallsyms.h | 8 include/linux/kernel.h | 1 - kernel/extable.c | 12 ++-- 5 files changed, 17 insertions(+), 20 deletions(-) diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index a1202536fc57..d92ec2ced059 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -175,7 +175,7 @@ static struct orc_entry *orc_find(unsigned long ip) } /* vmlinux .init slow lookup: */ - if (init_kernel_text(ip)) + if (is_kernel_inittext(ip)) return __orc_find(__start_orc_unwind_ip, __start_orc_unwind, __stop_orc_unwind_ip - __start_orc_unwind_ip, ip); diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 26ed9fc9b4e3..4f2f32aa2b7a 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -170,4 +170,18 @@ static inline bool is_kernel_rodata(unsigned long addr) addr < (unsigned long)__end_rodata; } +/** + * is_kernel_inittext - checks if the pointer address is located in the + *.init.text section + * + * @addr: address to check + * + * Returns: true if the address is located in .init.text, false otherwise. + */ +static inline bool is_kernel_inittext(unsigned long addr) +{ + return addr >= (unsigned long)_sinittext && + addr < (unsigned long)_einittext; +} + #endif /* _ASM_GENERIC_SECTIONS_H_ */ diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index b016c62f30a6..8a9d329c927c 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -24,14 +24,6 @@ struct cred; struct module; -static inline int is_kernel_inittext(unsigned long addr) -{ - if (addr >= (unsigned long)_sinittext - && addr < (unsigned long)_einittext) - return 1; - return 0; -} - static inline int is_kernel_text(unsigned long addr) { if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext)) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 0622418bafbc..d4ba46cf4737 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -229,7 +229,6 @@ extern bool parse_option_str(const char *str, const char *option); extern char *next_arg(char *args, char **param, char **val); extern int core_kernel_text(unsigned long addr); -extern int init_kernel_text(unsigned long addr); extern int __kernel_text_address(unsigned long addr); extern int kernel_text_address(unsigned long addr); extern int func_ptr_is_kernel_text(void *ptr); diff --git a/kernel/extable.c b/kernel/extable.c index da26203841d4..98ca627ac5ef 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -62,14 +62,6 @@ const struct exception_table_entry *search_exception_tables(unsigned long addr) return e; } -int init_kernel_text(unsigned long addr) -{ - if (addr >= (unsigned long)_sinittext && - addr < (unsigned long)_einittext) - return 1; - return 0; -} - int notrace core_kernel_text(unsigned long addr) { if (addr >= (unsigned long)_stext && @@ -77,7 +69,7 @@ int notrace core_kernel_text(unsigned long addr) return 1; if (system_state < SYSTEM_RUNNING && - init_kernel_text(addr)) + is_kernel_inittext(addr)) return 1; return 0; } @@ -94,7 +86,7 @@ int __kernel_text_address(unsigned long addr) * Since we are after the module-symbols check, there's * no danger of address overlap: */ - if (init_kernel_text(addr)) + if (is_kernel_inittext(addr)) return 1; return 0; } -- 2.26.2
[PATCH v2 5/7] kallsyms: Rename is_kernel() and is_kernel_text()
The is_kernel[_text]() function check the address whether or not in kernel[_text] ranges, also they will check the address whether or not in gate area, so use better name. Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Sami Tolvanen Cc: Nathan Chancellor Cc: Arnd Bergmann Cc: b...@vger.kernel.org Signed-off-by: Kefeng Wang --- arch/x86/net/bpf_jit_comp.c | 2 +- include/linux/kallsyms.h| 8 kernel/cfi.c| 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 333650b9372a..c87d0dd4370d 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -372,7 +372,7 @@ static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, void *old_addr, void *new_addr) { - if (!is_kernel_text((long)ip) && + if (!is_kernel_text_or_gate_area((long)ip) && !is_bpf_text_address((long)ip)) /* BPF poking in modules is not supported */ return -EINVAL; diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 8a9d329c927c..4f501ac9c2c2 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -24,14 +24,14 @@ struct cred; struct module; -static inline int is_kernel_text(unsigned long addr) +static inline int is_kernel_text_or_gate_area(unsigned long addr) { if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext)) return 1; return in_gate_area_no_mm(addr); } -static inline int is_kernel(unsigned long addr) +static inline int is_kernel_or_gate_area(unsigned long addr) { if (addr >= (unsigned long)_stext && addr < (unsigned long)_end) return 1; @@ -41,9 +41,9 @@ static inline int is_kernel(unsigned long addr) static inline int is_ksym_addr(unsigned long addr) { if (IS_ENABLED(CONFIG_KALLSYMS_ALL)) - return is_kernel(addr); + return is_kernel_or_gate_area(addr); - return is_kernel_text(addr) || is_kernel_inittext(addr); + return is_kernel_text_or_gate_area(addr) || is_kernel_inittext(addr); } static inline void *dereference_symbol_descriptor(void *ptr) diff --git a/kernel/cfi.c b/kernel/cfi.c index e17a56639766..e7d90eff4382 100644 --- a/kernel/cfi.c +++ b/kernel/cfi.c @@ -282,7 +282,7 @@ static inline cfi_check_fn find_check_fn(unsigned long ptr) { cfi_check_fn fn = NULL; - if (is_kernel_text(ptr)) + if (is_kernel_text_or_gate_area(ptr)) return __cfi_check; /* -- 2.26.2
[PATCH v2 2/7] kallsyms: Fix address-checks for kernel related range
The is_kernel_inittext/is_kernel_text/is_kernel function should not include the end address(the labels _einittext, _etext and _end) when check the address range, the issue exists since Linux v2.6.12. Cc: Arnd Bergmann Cc: Sergey Senozhatsky Cc: Petr Mladek Acked-by: Sergey Senozhatsky Reviewed-by: Petr Mladek Signed-off-by: Kefeng Wang --- include/linux/kallsyms.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 2a241e3f063f..b016c62f30a6 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -27,21 +27,21 @@ struct module; static inline int is_kernel_inittext(unsigned long addr) { if (addr >= (unsigned long)_sinittext - && addr <= (unsigned long)_einittext) + && addr < (unsigned long)_einittext) return 1; return 0; } static inline int is_kernel_text(unsigned long addr) { - if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext)) + if ((addr >= (unsigned long)_stext && addr < (unsigned long)_etext)) return 1; return in_gate_area_no_mm(addr); } static inline int is_kernel(unsigned long addr) { - if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end) + if (addr >= (unsigned long)_stext && addr < (unsigned long)_end) return 1; return in_gate_area_no_mm(addr); } -- 2.26.2
[PATCH v2 1/7] kallsyms: Remove arch specific text and data check
After commit 4ba66a976072 ("arch: remove blackfin port"), no need arch-specific text/data check. Cc: Arnd Bergmann Signed-off-by: Kefeng Wang --- include/asm-generic/sections.h | 16 include/linux/kallsyms.h | 3 +-- kernel/locking/lockdep.c | 3 --- 3 files changed, 1 insertion(+), 21 deletions(-) diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index d16302d3eb59..817309e289db 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -64,22 +64,6 @@ extern __visible const void __nosave_begin, __nosave_end; #define dereference_kernel_function_descriptor(p) ((void *)(p)) #endif -/* random extra sections (if any). Override - * in asm/sections.h */ -#ifndef arch_is_kernel_text -static inline int arch_is_kernel_text(unsigned long addr) -{ - return 0; -} -#endif - -#ifndef arch_is_kernel_data -static inline int arch_is_kernel_data(unsigned long addr) -{ - return 0; -} -#endif - /* * Check if an address is part of freed initmem. This is needed on architectures * with virt == phys kernel mapping, for code that wants to check if an address diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 6851c2313cad..2a241e3f063f 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -34,8 +34,7 @@ static inline int is_kernel_inittext(unsigned long addr) static inline int is_kernel_text(unsigned long addr) { - if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) || - arch_is_kernel_text(addr)) + if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext)) return 1; return in_gate_area_no_mm(addr); } diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index bf1c00c881e4..64b17e995108 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -803,9 +803,6 @@ static int static_obj(const void *obj) if ((addr >= start) && (addr < end)) return 1; - if (arch_is_kernel_data(addr)) - return 1; - /* * in-kernel percpu var? */ -- 2.26.2
[PATCH v2 0/7] sections: Unify kernel sections range check and use
There are three head files(kallsyms.h, kernel.h and sections.h) which include the kernel sections range check, let's make some cleanup and unify them. 1. cleanup arch specific text/data check and fix address boundary check in kallsyms.h 2. make all the basic/core kernel range check function into sections.h 3. update all the callers, and use the helper in sections.h to simplify the code After this series, we have 5 APIs about kernel sections range check in sections.h * is_kernel_core_data()--- come from core_kernel_data() in kernel.h * is_kernel_rodata() --- already in sections.h * is_kernel_text() --- come from kallsyms.h * is_kernel_inittext() --- come from kernel.h and kallsyms.h * is_kernel() --- come from kallsyms.h Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: io...@lists.linux-foundation.org Cc: b...@vger.kernel.org v2: - add ACK/RW to patch2, and drop inappropriate fix tag - keep 'core' to check kernel data, suggestted by Steven Rostedt , rename is_kernel_data() to is_kernel_core_data() - drop patch8 which is merged - drop patch9 which is resend independently v1: https://lore.kernel.org/linux-arch/20210626073439.150586-1-wangkefeng.w...@huawei.com Kefeng Wang (7): kallsyms: Remove arch specific text and data check kallsyms: Fix address-checks for kernel related range sections: Move and rename core_kernel_data() to is_kernel_core_data() sections: Move is_kernel_inittext() into sections.h kallsyms: Rename is_kernel() and is_kernel_text() sections: Add new is_kernel() and is_kernel_text() powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper arch/powerpc/mm/pgtable_32.c | 7 +--- arch/x86/kernel/unwind_orc.c | 2 +- arch/x86/net/bpf_jit_comp.c| 2 +- include/asm-generic/sections.h | 71 ++ include/linux/kallsyms.h | 21 +++--- include/linux/kernel.h | 2 - kernel/cfi.c | 2 +- kernel/extable.c | 33 ++-- kernel/locking/lockdep.c | 3 -- kernel/trace/ftrace.c | 2 +- mm/kasan/report.c | 2 +- net/sysctl_net.c | 2 +- 12 files changed, 72 insertions(+), 77 deletions(-) -- 2.26.2
[powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!
linux-next fails to boot on Power server (POWER8/POWER9). Following traces are seen during boot [0.010799] software IO TLB: tearing down default memory pool [0.010805] [ cut here ] [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! [0.010812] Oops: Exception in kernel mode, sig: 5 [#1] [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [0.010820] Modules linked in: [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc3-next-20210727 #1 [0.010830] NIP: c0032cfc LR: c000c764 CTR: c000c670 [0.010834] REGS: c3603b10 TRAP: 0700 Not tainted (5.14.0-rc3-next-20210727) [0.010838] MSR: 80029033 CR: 28000222 XER: 0002 [0.010848] CFAR: c000c760 IRQMASK: 3 [0.010848] GPR00: c000c764 c3603db0 c29bd000 0001 [0.010848] GPR04: 0a68 0400 c3603868 [0.010848] GPR08: 0003 [0.010848] GPR12: c0001ec9ee80 c0012a28 [0.010848] GPR16: [0.010848] GPR20: [0.010848] GPR24: f134 c3603868 [0.010848] GPR28: 0400 0a68 c202e9c0 c3603e80 [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0 [0.010901] LR [c000c764] system_call_common+0xf4/0x258 [0.010907] Call Trace: [0.010909] [c3603db0] [c016a6dc] calculate_sigpending+0x4c/0xe0 (unreliable) [0.010915] [c3603e10] [c000c764] system_call_common+0xf4/0x258 [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8 [0.010926] NIP: c0092dec LR: c0114fc8 CTR: [0.010930] REGS: c3603e80 TRAP: 0c00 Not tainted (5.14.0-rc3-next-20210727) [0.010934] MSR: 80009033 CR: 28000222 XER: [0.010943] IRQMASK: 0 [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 f134 [0.010943] GPR04: 0a68 0400 c3603868 [0.010943] GPR08: [0.010943] GPR12: c0001ec9ee80 c0012a28 [0.010943] GPR16: [0.010943] GPR20: [0.010943] GPR24: c20033c4 c110afc0 c2081950 c3277d40 [0.010943] GPR28: ca68 0400 000d [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8 [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60 [0.010999] --- interrupt: c00 [0.011001] [c3603b00] [c000c764] system_call_common+0xf4/0x258 (unreliable) [0.011008] Instruction dump: [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 e87f0108 68690002 [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 792907e0 0b09 [0.011029] ---[ end trace a20ad55589efcb10 ]--- [0.012297] [1.012304] Kernel panic - not syncing: Fatal exception next-20210723 was good. The boot failure seems to have been introduced with next-20210726. I have attached the boot log. Thanks -Sachin [0.00] hash-mmu: Page sizes from device-tree: [0.00] hash-mmu: base_shift=12: shift=12, sllp=0x, avpnm=0x, tlbiel=1, penc=0 [0.00] hash-mmu: base_shift=12: shift=16, sllp=0x, avpnm=0x, tlbiel=1, penc=7 [0.00] hash-mmu: base_shift=12: shift=24, sllp=0x, avpnm=0x, tlbiel=1, penc=56 [0.00] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x, tlbiel=1, penc=1 [0.00] hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x, tlbiel=1, penc=8 [0.00] hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x0001, tlbiel=0, penc=0 [0.00] hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x07ff, tlbiel=0, penc=3 [0.00] Enabling pkeys with max key count 31 [0.00] Activating Kernel Userspace Execution Prevention [0.00] Activating Kernel Userspace Access Prevention [0.00] Using 1TB segments [0.00] hash-mmu: Initializing hash mmu with SLB [0.00] Linux version 5.14.0-rc3-next-20210727 (r...@ltczz304-lp7.aus.stglabs.ibm.com) (gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), GNU ld version 2.30-93.el8) #1 SMP Wed Jul 28 01:12:04 EDT 2021 [0.00] Found initrd at 0xc558:0xca67e
[PATCH V2] powerpc/fadump: register for fadump as early as possible
Crash recovery (fadump) is setup in the userspace by some service. This service rebuilds initrd with dump capture capability, if it is not already dump capture capable before proceeding to register for firmware assisted dump (echo 1 > /sys/kernel/fadump/registered). But arming the kernel with crash recovery support does not have to wait for userspace configuration. So, register for fadump while setting it up itself. This can at worst lead to a scenario, where /proc/vmcore is ready afer crash but the initrd does not know how/where to offload it, which is always better than not having a /proc/vmcore at all due to incomplete configuration in the userspace at the time of crash. Commit 0823c68b054b ("powerpc/fadump: re-register firmware-assisted dump if already registered") ensures this change does not break userspace. Signed-off-by: Hari Bathini --- Changes in V2: * Updated the changelog with bit more explanation about userspace issue with/without this change. * Added a comment in the code for why setup_fadump function is changed from subsys_init() to subsys_init_sync() call. arch/powerpc/kernel/fadump.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index b990075285f5..2911aefdf594 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1637,13 +1637,20 @@ int __init setup_fadump(void) if (fw_dump.ops->fadump_process(&fw_dump) < 0) fadump_invalidate_release_mem(); } - /* Initialize the kernel dump memory structure for FAD registration. */ - else if (fw_dump.reserve_dump_area_size) + /* Initialize the kernel dump memory structure and register with f/w */ + else if (fw_dump.reserve_dump_area_size) { fw_dump.ops->fadump_init_mem_struct(&fw_dump); + register_fadump(); + } return 1; } -subsys_initcall(setup_fadump); +/* + * Replace subsys_initcall() with subsys_initcall_sync() as there is dependency + * with crash_save_vmcoreinfo_init() to ensure vmcoreinfo initialization is done + * before regisering with f/w. + */ +subsys_initcall_sync(setup_fadump); #else /* !CONFIG_PRESERVE_FA_DUMP */ /* Scan the Firmware Assisted dump configuration details. */