Re: [PATCH v3 03/16] pseries/fadump: move out platform specific support from generic code
On Wed, 2019-06-26 at 02:16 +0530, Hari Bathini wrote: > Introduce callbacks for platform specific operations like register, > unregister, invalidate & such, and move pseries specific code into > platform code. Please don't move around large blocks of code *and* change the code in a single patch. It makes reviewing the changes extremely tedious since the changes are mixed in with hundreds of lines of nothing. > Signed-off-by: Hari Bathini > --- > arch/powerpc/include/asm/fadump.h| 75 > arch/powerpc/kernel/fadump-common.h | 38 ++ > arch/powerpc/kernel/fadump.c | 500 ++--- > arch/powerpc/platforms/pseries/Makefile |1 > arch/powerpc/platforms/pseries/rtas-fadump.c | 529 > ++ > arch/powerpc/platforms/pseries/rtas-fadump.h | 96 + > 6 files changed, 700 insertions(+), 539 deletions(-) > create mode 100644 arch/powerpc/platforms/pseries/rtas-fadump.c > create mode 100644 arch/powerpc/platforms/pseries/rtas-fadump.h > > +static struct fadump_ops pseries_fadump_ops = { > + .init_fadump_mem_struct = pseries_init_fadump_mem_struct, > + .register_fadump= pseries_register_fadump, I realise you are just translating the existing interface, but why is init_fadump_mem_struct() done as a seperate step and not as a part of the registration function? The struct doesn't seem to be necessary until the actual registration happens. > + .unregister_fadump = pseries_unregister_fadump, > + .invalidate_fadump = pseries_invalidate_fadump, > + .process_fadump = pseries_process_fadump, > + .fadump_region_show = pseries_fadump_region_show, > + .crash_fadump = pseries_crash_fadump, Rename this to fadump_trigger or something, it's not clear what it does.
Re: [PATCH v3 01/16] powerpc/fadump: move internal fadump code to a new file
On Wed, 2019-06-26 at 02:15 +0530, Hari Bathini wrote: > Refactoring fadump code means internal fadump code is referenced from > different places. For ease, move internal code to a new file. Can you elaborate a bit? I don't really get what the difference between fadump and fadump-internal code is supposed to be. Why can't all this just live in fadump.c?
Re: [RFC PATCH v2 11/12] powerpc/ptrace: create ppc_gethwdinfo()
On 6/28/19 9:18 PM, Christophe Leroy wrote: > Create ippc_gethwdinfo() to handle PPC_PTRACE_GETHWDBGINFO and > reduce ifdef mess > > Signed-off-by: Christophe Leroy > --- Reviewed-by: Ravi Bangoria
Re: [RFC PATCH v2 12/12] powerpc/ptrace: move ptrace_triggered() into hw_breakpoint.c
On 6/28/19 9:18 PM, Christophe Leroy wrote: > ptrace_triggered() is declared in asm/hw_breakpoint.h and > only needed when CONFIG_HW_BREAKPOINT is set, so move it > into hw_breakpoint.c > > Signed-off-by: Christophe Leroy Reviewed-by: Ravi Bangoria
Re: [RFC PATCH v2 10/12] powerpc/ptrace: create ptrace_get_debugreg()
On 6/28/19 9:17 PM, Christophe Leroy wrote: > Create ptrace_get_debugreg() to handle PTRACE_GET_DEBUGREG and > reduce ifdef mess > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/kernel/ptrace/ptrace-adv.c | 9 + > arch/powerpc/kernel/ptrace/ptrace-decl.h | 2 ++ > arch/powerpc/kernel/ptrace/ptrace-noadv.c | 13 + > arch/powerpc/kernel/ptrace/ptrace.c | 18 ++ > 4 files changed, 26 insertions(+), 16 deletions(-) > > diff --git a/arch/powerpc/kernel/ptrace/ptrace-adv.c > b/arch/powerpc/kernel/ptrace/ptrace-adv.c > index 86e71fa6c5c8..dcc765940344 100644 > --- a/arch/powerpc/kernel/ptrace/ptrace-adv.c > +++ b/arch/powerpc/kernel/ptrace/ptrace-adv.c > @@ -83,6 +83,15 @@ void user_disable_single_step(struct task_struct *task) > clear_tsk_thread_flag(task, TIF_SINGLESTEP); > } > > +int ptrace_get_debugreg(struct task_struct *child, unsigned long addr, > + unsigned long __user *datalp) > +{ > + /* We only support one DABR and no IABRS at the moment */ No DABR / IABR in ptrace-adv.c > + if (addr > 0) > + return -EINVAL; > + return put_user(child->thread.debug.dac1, datalp); > +} > + > int ptrace_set_debugreg(struct task_struct *task, unsigned long addr, > unsigned long data) > { > /* For ppc64 we support one DABR and no IABR's at the moment (ppc64). > diff --git a/arch/powerpc/kernel/ptrace/ptrace-decl.h > b/arch/powerpc/kernel/ptrace/ptrace-decl.h > index bdba09a87aea..4b4b6a1d508a 100644 > --- a/arch/powerpc/kernel/ptrace/ptrace-decl.h > +++ b/arch/powerpc/kernel/ptrace/ptrace-decl.h > @@ -176,6 +176,8 @@ int tm_cgpr32_set(struct task_struct *target, const > struct user_regset *regset, > extern const struct user_regset_view user_ppc_native_view; > > /* ptrace-(no)adv */ > +int ptrace_get_debugreg(struct task_struct *child, unsigned long addr, > + unsigned long __user *datalp); > int ptrace_set_debugreg(struct task_struct *task, unsigned long addr, > unsigned long data); > long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint > *bp_info); > long ppc_del_hwdebug(struct task_struct *child, long data); > diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c > b/arch/powerpc/kernel/ptrace/ptrace-noadv.c > index 7db330c94538..985cca136f85 100644 > --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c > +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c > @@ -64,6 +64,19 @@ void user_disable_single_step(struct task_struct *task) > clear_tsk_thread_flag(task, TIF_SINGLESTEP); > } > > +int ptrace_get_debugreg(struct task_struct *child, unsigned long addr, > + unsigned long __user *datalp) > +{ > + unsigned long dabr_fake; > + > + /* We only support one DABR and no IABRS at the moment */ > + if (addr > 0) > + return -EINVAL; > + dabr_fake = ((child->thread.hw_brk.address & (~HW_BRK_TYPE_DABR)) | > + (child->thread.hw_brk.type & HW_BRK_TYPE_DABR)); > + return put_user(dabr_fake, datalp); > +} > + > int ptrace_set_debugreg(struct task_struct *task, unsigned long addr, > unsigned long data) > { > #ifdef CONFIG_HAVE_HW_BREAKPOINT > diff --git a/arch/powerpc/kernel/ptrace/ptrace.c > b/arch/powerpc/kernel/ptrace/ptrace.c > index 377e0e541d5f..e789afae6f56 100644 > --- a/arch/powerpc/kernel/ptrace/ptrace.c > +++ b/arch/powerpc/kernel/ptrace/ptrace.c > @@ -211,23 +211,9 @@ long arch_ptrace(struct task_struct *child, long request, > break; > } > > - case PTRACE_GET_DEBUGREG: { > -#ifndef CONFIG_PPC_ADV_DEBUG_REGS > - unsigned long dabr_fake; > -#endif > - ret = -EINVAL; > - /* We only support one DABR and no IABRS at the moment */ > - if (addr > 0) > - break; > -#ifdef CONFIG_PPC_ADV_DEBUG_REGS > - ret = put_user(child->thread.debug.dac1, datalp); > -#else > - dabr_fake = ((child->thread.hw_brk.address & > (~HW_BRK_TYPE_DABR)) | > - (child->thread.hw_brk.type & HW_BRK_TYPE_DABR)); > - ret = put_user(dabr_fake, datalp); > -#endif > + case PTRACE_GET_DEBUGREG: > + ret = ptrace_get_debugreg(child, addr, datalp); > break; > - } > > case PTRACE_SET_DEBUGREG: > ret = ptrace_set_debugreg(child, addr, data); > Otherwise, Reviewed-by: Ravi Bangoria
Re: [RFC PATCH v2 09/12] powerpc/ptrace: split out ADV_DEBUG_REGS related functions.
On 6/28/19 9:17 PM, Christophe Leroy wrote: > diff --git a/arch/powerpc/kernel/ptrace/ptrace-adv.c > b/arch/powerpc/kernel/ptrace/ptrace-adv.c > new file mode 100644 > index ..86e71fa6c5c8 > --- /dev/null > +++ b/arch/powerpc/kernel/ptrace/ptrace-adv.c > @@ -0,0 +1,487 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +void user_enable_single_step(struct task_struct *task) > +{ > + struct pt_regs *regs = task->thread.regs; > + > + if (regs != NULL) { > + task->thread.debug.dbcr0 &= ~DBCR0_BT; > + task->thread.debug.dbcr0 |= DBCR0_IDM | DBCR0_IC; > + regs->msr |= MSR_DE; > + } > + set_tsk_thread_flag(task, TIF_SINGLESTEP); > +} > + > +void user_enable_block_step(struct task_struct *task) > +{ > + struct pt_regs *regs = task->thread.regs; > + > + if (regs != NULL) { > + task->thread.debug.dbcr0 &= ~DBCR0_IC; > + task->thread.debug.dbcr0 = DBCR0_IDM | DBCR0_BT; > + regs->msr |= MSR_DE; > + } > + set_tsk_thread_flag(task, TIF_SINGLESTEP); > +} > + > +void user_disable_single_step(struct task_struct *task) > +{ > + struct pt_regs *regs = task->thread.regs; > + > + if (regs != NULL) { > + /* > + * The logic to disable single stepping should be as > + * simple as turning off the Instruction Complete flag. > + * And, after doing so, if all debug flags are off, turn > + * off DBCR0(IDM) and MSR(DE) Torez > + */ > + task->thread.debug.dbcr0 &= ~(DBCR0_IC|DBCR0_BT); > + /* > + * Test to see if any of the DBCR_ACTIVE_EVENTS bits are set. > + */ > + if (!DBCR_ACTIVE_EVENTS(task->thread.debug.dbcr0, > + task->thread.debug.dbcr1)) { > + /* > + * All debug events were off. > + */ > + task->thread.debug.dbcr0 &= ~DBCR0_IDM; > + regs->msr &= ~MSR_DE; > + } > + } > + clear_tsk_thread_flag(task, TIF_SINGLESTEP); > +} > + > +int ptrace_set_debugreg(struct task_struct *task, unsigned long addr, > unsigned long data) > +{ > + /* For ppc64 we support one DABR and no IABR's at the moment (ppc64). > + * For embedded processors we support one DAC and no IAC's at the > + * moment. > + */ I guess mentioning DABR and IABR doesn't make sense in ptrace-adv.c? > + if (addr > 0) > + return -EINVAL; > + > + /* The bottom 3 bits in dabr are flags */ Same here. > + if ((data & ~0x7UL) >= TASK_SIZE) > + return -EIO; > + > + /* As described above, it was assumed 3 bits were passed with the data > + * address, but we will assume only the mode bits will be passed > + * as to not cause alignment restrictions for DAC-based processors. > + */ > + > + /* DAC's hold the whole address without any mode flags */ > + task->thread.debug.dac1 = data & ~0x3UL; > + > + if (task->thread.debug.dac1 == 0) { > + dbcr_dac(task) &= ~(DBCR_DAC1R | DBCR_DAC1W); > + if (!DBCR_ACTIVE_EVENTS(task->thread.debug.dbcr0, > + task->thread.debug.dbcr1)) { > + task->thread.regs->msr &= ~MSR_DE; > + task->thread.debug.dbcr0 &= ~DBCR0_IDM; > + } > + return 0; > + } > + > + /* Read or Write bits must be set */ > + > + if (!(data & 0x3UL)) > + return -EINVAL; > + > + /* Set the Internal Debugging flag (IDM bit 1) for the DBCR0 > +register */ > + task->thread.debug.dbcr0 |= DBCR0_IDM; > + > + /* Check for write and read flags and set DBCR0 > +accordingly */ > + dbcr_dac(task) &= ~(DBCR_DAC1R|DBCR_DAC1W); > + if (data & 0x1UL) > + dbcr_dac(task) |= DBCR_DAC1R; > + if (data & 0x2UL) > + dbcr_dac(task) |= DBCR_DAC1W; > + task->thread.regs->msr |= MSR_DE; > + return 0; > +} > + > +static long set_instruction_bp(struct task_struct *child, > + struct ppc_hw_breakpoint *bp_info) > +{ > + int slot; > + int slot1_in_use = ((child->thread.debug.dbcr0 & DBCR0_IAC1) != 0); > + int slot2_in_use = ((child->thread.debug.dbcr0 & DBCR0_IAC2) != 0); > + int slot3_in_use = ((child->thread.debug.dbcr0 & DBCR0_IAC3) != 0); > + int slot4_in_use = ((child->thread.debug.dbcr0 & DBCR0_IAC4) != 0); > + > + if (dbcr_iac_range(child)
[PATCH AUTOSEL 5.1 09/39] selftests/powerpc: Add test of fork with mapping above 512TB
From: Michael Ellerman [ Upstream commit 16391bfc862342f285195013b73c1394fab28b97 ] This tests that when a process with a mapping above 512TB forks we correctly separate the parent and child address spaces. This exercises the bug in the context id handling fixed in the previous commit. Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- tools/testing/selftests/powerpc/mm/.gitignore | 3 +- tools/testing/selftests/powerpc/mm/Makefile | 4 +- .../powerpc/mm/large_vm_fork_separation.c | 87 +++ 3 files changed, 92 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/powerpc/mm/large_vm_fork_separation.c diff --git a/tools/testing/selftests/powerpc/mm/.gitignore b/tools/testing/selftests/powerpc/mm/.gitignore index ba919308fe30..d503b8764a8e 100644 --- a/tools/testing/selftests/powerpc/mm/.gitignore +++ b/tools/testing/selftests/powerpc/mm/.gitignore @@ -3,4 +3,5 @@ subpage_prot tempfile prot_sao segv_errors -wild_bctr \ No newline at end of file +wild_bctr +large_vm_fork_separation \ No newline at end of file diff --git a/tools/testing/selftests/powerpc/mm/Makefile b/tools/testing/selftests/powerpc/mm/Makefile index 43d68420e363..f1fbc15800c4 100644 --- a/tools/testing/selftests/powerpc/mm/Makefile +++ b/tools/testing/selftests/powerpc/mm/Makefile @@ -2,7 +2,8 @@ noarg: $(MAKE) -C ../ -TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors wild_bctr +TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors wild_bctr \ + large_vm_fork_separation TEST_GEN_FILES := tempfile top_srcdir = ../../../../.. @@ -13,6 +14,7 @@ $(TEST_GEN_PROGS): ../harness.c $(OUTPUT)/prot_sao: ../utils.c $(OUTPUT)/wild_bctr: CFLAGS += -m64 +$(OUTPUT)/large_vm_fork_separation: CFLAGS += -m64 $(OUTPUT)/tempfile: dd if=/dev/zero of=$@ bs=64k count=1 diff --git a/tools/testing/selftests/powerpc/mm/large_vm_fork_separation.c b/tools/testing/selftests/powerpc/mm/large_vm_fork_separation.c new file mode 100644 index ..2363a7f3ab0d --- /dev/null +++ b/tools/testing/selftests/powerpc/mm/large_vm_fork_separation.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0+ +// +// Copyright 2019, Michael Ellerman, IBM Corp. +// +// Test that allocating memory beyond the memory limit and then forking is +// handled correctly, ie. the child is able to access the mappings beyond the +// memory limit and the child's writes are not visible to the parent. + +#include +#include +#include +#include +#include +#include + +#include "utils.h" + + +#ifndef MAP_FIXED_NOREPLACE +#define MAP_FIXED_NOREPLACEMAP_FIXED // "Should be safe" above 512TB +#endif + + +static int test(void) +{ + int p2c[2], c2p[2], rc, status, c, *p; + unsigned long page_size; + pid_t pid; + + page_size = sysconf(_SC_PAGESIZE); + SKIP_IF(page_size != 65536); + + // Create a mapping at 512TB to allocate an extended_id + p = mmap((void *)(512ul << 40), page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE, -1, 0); + if (p == MAP_FAILED) { + perror("mmap"); + printf("Error: couldn't mmap(), confirm kernel has 4TB support?\n"); + return 1; + } + + printf("parent writing %p = 1\n", p); + *p = 1; + + FAIL_IF(pipe(p2c) == -1 || pipe(c2p) == -1); + + pid = fork(); + if (pid == 0) { + FAIL_IF(read(p2c[0], , 1) != 1); + + pid = getpid(); + printf("child writing %p = %d\n", p, pid); + *p = pid; + + FAIL_IF(write(c2p[1], , 1) != 1); + FAIL_IF(read(p2c[0], , 1) != 1); + exit(0); + } + + c = 0; + FAIL_IF(write(p2c[1], , 1) != 1); + FAIL_IF(read(c2p[0], , 1) != 1); + + // Prevent compiler optimisation + barrier(); + + rc = 0; + printf("parent reading %p = %d\n", p, *p); + if (*p != 1) { + printf("Error: BUG! parent saw child's write! *p = %d\n", *p); + rc = 1; + } + + FAIL_IF(write(p2c[1], , 1) != 1); + FAIL_IF(waitpid(pid, , 0) == -1); + FAIL_IF(!WIFEXITED(status) || WEXITSTATUS(status)); + + if (rc == 0) + printf("success: test completed OK\n"); + + return rc; +} + +int main(void) +{ + return test_harness(test, "large_vm_fork_separation"); +} -- 2.20.1
Re: [PATCH v2] powerpc/mm/nvdimm: Add an informative message if we fail to allocate altmap block
On Tue, Jul 2, 2019 at 12:33 AM Aneesh Kumar K.V wrote: > > Allocation from altmap area can fail based on vmemmap page size used. Add > kernel > info message to indicate the failure. That allows the user to identify > whether they > are really using persistent memory reserved space for per-page metadata. > > The message looks like: > [ 136.587212] altmap block allocation failed, falling back to system memory > > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/mm/init_64.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c > index a4e17a979e45..f3b64f49082b 100644 > --- a/arch/powerpc/mm/init_64.c > +++ b/arch/powerpc/mm/init_64.c > @@ -194,8 +194,12 @@ int __meminit vmemmap_populate(unsigned long start, > unsigned long end, int node, > * fail due to alignment issues when using 16MB hugepages, so > * fall back to system memory if the altmap allocation fail. > */ > - if (altmap) > + if (altmap) { > p = altmap_alloc_block_buf(page_size, altmap); > + if (!p) > + pr_debug("altmap block allocation failed, " \ > + "falling back to system memory"); > + } > if (!p) > p = vmemmap_alloc_block_buf(page_size, node); > if (!p) > -- > 2.21.0 > I'll let mpe decide if he cares about the split line thing :) Reviewed-by: Oliver O'Halloran
[PATCH 3/3] KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries
The performance stop status and control register (PSSCR) is used to control the power saving facilities of the processor. This register has various fields, some of which can be modified only in hypervisor state, and others which can be modified in both hypervisor and priviledged non-hypervisor state. The bits which can be modified in priviledged non-hypervisor state are referred to as guest visible. Currently the L0 hypervisor saves and restores both it's own host value as well as the guest value of the psscr when context switching between the hypervisor and guest. However a nested hypervisor running it's own nested guests (as indicated by kvmhv_on_pseries()) doesn't context switch the psscr register. This means that if a nested (L2) guest modified the psscr that the L1 guest hypervisor will run with this value, and if the L1 guest hypervisor modified this value and then goes to run the nested (L2) guest again that the L2 psscr value will be lost. Fix this by having the (L1) nested hypervisor save and restore both its host and the guest psscr value when entering and exiting a nested (L2) guest. Note that only the guest visible parts of the psscr are context switched since this is all the L1 nested hypervisor can access, this is fine however as these are the only fields the L0 hypervisor provides guest control of anyway and so all other fields are ignored. This could also have been implemented by adding the psscr register to the hv_regs passed to the L0 hypervisor as input to the H_ENTER_NESTED hcall, however this would have meant updating the structure layout and thus required modifications to both the L0 and L1 kernels. Whereas the approach used doesn't require L0 kernel modifications while achieving the same result. Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests" Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_hv.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b682a429f3ef..cde3f5a4b3e4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3569,9 +3569,18 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, mtspr(SPRN_DEC, vcpu->arch.dec_expires - mftb()); if (kvmhv_on_pseries()) { + /* +* We need to save and restore the guest visible part of the +* psscr (i.e. using SPRN_PSSCR_PR) since the hypervisor +* doesn't do this for us. Note only required if pseries since +* this is done in kvmhv_load_hv_regs_and_go() below otherwise. +*/ + unsigned long host_psscr; /* call our hypervisor to load up HV regs and go */ struct hv_guest_state hvregs; + host_psscr = mfspr(SPRN_PSSCR_PR); + mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr); kvmhv_save_hv_regs(vcpu, ); hvregs.lpcr = lpcr; vcpu->arch.regs.msr = vcpu->arch.shregs.msr; @@ -3590,6 +3599,8 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, vcpu->arch.shregs.msr = vcpu->arch.regs.msr; vcpu->arch.shregs.dar = mfspr(SPRN_DAR); vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR); + vcpu->arch.psscr = mfspr(SPRN_PSSCR_PR); + mtspr(SPRN_PSSCR_PR, host_psscr); /* H_CEDE has to be handled now, not later */ if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested && -- 2.13.6
[PATCH 2/3] PPC: PMC: Set pmcregs_in_use in paca when running as LPAR
The ability to run nested guests under KVM means that a guest can also act as a hypervisor for it's own nested guest. Currently ppc_set_pmu_inuse() assumes that either FW_FEATURE_LPAR is set, indicating a guest environment, and so sets the pmcregs_in_use flag in the lppaca, or that it isn't set, indicating a hypervisor environment, and so sets the pmcregs_in_use flag in the paca. The pmcregs_in_use flag in the lppaca is used to communicate this information to a hypervisor and so must be set in a guest environment. The pmcregs_in_use flag in the paca is used by KVM code to determine whether the host state of the performance monitoring unit (PMU) must be saved and restored when running a guest. Thus when a guest also acts as a hypervisor it must set this bit in both places since it needs to ensure both that the real hypervisor saves it's pmu registers when it runs (requires pmcregs_in_use flag in lppaca), and that it saves it's own pmu registers when running a nested guest (requires pmcregs_in_use flag in paca). Modify ppc_set_pmu_inuse() so that the pmcregs_in_use bit is set in both the lppaca and the paca when a guest (LPAR) is running with the capability of running it's own guests (CONFIG_KVM_BOOK3S_HV_POSSIBLE). Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests" Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/pmc.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/pmc.h b/arch/powerpc/include/asm/pmc.h index dc9a1ca70edf..c6bbe9778d3c 100644 --- a/arch/powerpc/include/asm/pmc.h +++ b/arch/powerpc/include/asm/pmc.h @@ -27,11 +27,10 @@ static inline void ppc_set_pmu_inuse(int inuse) #ifdef CONFIG_PPC_PSERIES get_lppaca()->pmcregs_in_use = inuse; #endif - } else { + } #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE - get_paca()->pmcregs_in_use = inuse; + get_paca()->pmcregs_in_use = inuse; #endif - } #endif } -- 2.13.6
[PATCH 1/3] KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting
The performance monitoring unit (PMU) registers are saved on guest exit when the guest has set the pmcregs_in_use flag in its lppaca, if it exists, or unconditionally if it doesn't. If a nested guest is being run then the hypervisor doesn't, and in most cases can't, know if the pmu registers are in use since it doesn't know the location of the lppaca for the nested guest, although it may have one for its immediate guest. This results in the values of these registers being lost across nested guest entry and exit in the case where the nested guest was making use of the performance monitoring facility while it's nested guest hypervisor wasn't. Further more the hypervisor could interrupt a guest hypervisor between when it has loaded up the pmu registers and it calling H_ENTER_NESTED or between returning from the nested guest to the guest hypervisor and the guest hypervisor reading the pmu registers, in kvmhv_p9_guest_entry(). This means that it isn't sufficient to just save the pmu registers when entering or exiting a nested guest, but that it is necessary to always save the pmu registers whenever a guest is capable of running nested guests to ensure the register values aren't lost in the context switch. Ensure the pmu register values are preserved by always saving their value into the vcpu struct when a guest is capable of running nested guests. This should have minimal performance impact however any impact can be avoided by booting a guest with "-machine pseries,cap-nested-hv=false" on the qemu commandline. Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests" Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_hv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index ec1804f822af..b682a429f3ef 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3654,6 +3654,8 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, vcpu->arch.vpa.dirty = 1; save_pmu = lp->pmcregs_in_use; } + /* Must save pmu if this guest is capable of running nested guests */ + save_pmu |= nesting_enabled(vcpu->kvm); kvmhv_save_guest_pmu(vcpu, save_pmu); -- 2.13.6
Re: [PATCH] powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state.
Madhavan Srinivasan's on July 2, 2019 8:58 pm: > From: Athira Rajeev > > commit 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") > reimplemented book3S code to pltform/powernv/idle.c. But when doing so > missed to add the per-thread LDBAR update in the core_woken path of > the power9_idle_stop(). Patch fixes the same. > > Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") > Signed-off-by: Athira Rajeev > Signed-off-by: Madhavan Srinivasan > --- > arch/powerpc/platforms/powernv/idle.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/powernv/idle.c > b/arch/powerpc/platforms/powernv/idle.c > index 2f4479b94ac3..fd14a6237954 100644 > --- a/arch/powerpc/platforms/powernv/idle.c > +++ b/arch/powerpc/platforms/powernv/idle.c > @@ -758,7 +758,6 @@ static unsigned long power9_idle_stop(unsigned long > psscr, bool mmu_on) > mtspr(SPRN_PTCR,sprs.ptcr); > mtspr(SPRN_RPR, sprs.rpr); > mtspr(SPRN_TSCR,sprs.tscr); > - mtspr(SPRN_LDBAR, sprs.ldbar); > > if (pls >= pnv_first_tb_loss_level) { > /* TB loss */ > @@ -790,6 +789,7 @@ static unsigned long power9_idle_stop(unsigned long > psscr, bool mmu_on) > mtspr(SPRN_MMCR0, sprs.mmcr0); > mtspr(SPRN_MMCR1, sprs.mmcr1); > mtspr(SPRN_MMCR2, sprs.mmcr2); > + mtspr(SPRN_LDBAR, sprs.ldbar); Oh that's another one I messed up, thanks for the fix. I must have confused myself with the SPR table in the UM :( Reviewed-by: Nicholas Piggin
Re: [PATCH V2] mm/ioremap: Probe platform for p4d huge map support
On Fri, 28 Jun 2019 10:50:31 +0530 Anshuman Khandual wrote: > Finishing up what the commit c2febafc67734a ("mm: convert generic code to > 5-level paging") started out while levelling up P4D huge mapping support > at par with PUD and PMD. A new arch call back arch_ioremap_p4d_supported() > is being added which just maintains status quo (P4D huge map not supported) > on x86, arm64 and powerpc. Does this have any runtime effects? If so, what are they and why? If not, what's the actual point?
Re: [PATCH net] net/ibmvnic: Report last valid speed and duplex values to ethtool
From: Thomas Falcon Date: Thu, 27 Jun 2019 12:09:13 -0500 > This patch resolves an issue with sensitive bonding modes > that require valid speed and duplex settings to function > properly. Currently, the adapter will report that device > speed and duplex is unknown if the communication link > with firmware is unavailable. This decision can break LACP > configurations if the timing is right. > > For example, if invalid speeds are reported, the slave > device's link state is set to a transitional "fail" state > and the LACP port is disabled. However, if valid speeds > are reported later but the link state has not been altered, > the LACP port will remain disabled. If the link state then > transitions back to "up" from "fail," it results in a state > such that the slave reports valid speed/duplex and is up, > but the LACP port will remain disabled. > > Workaround this by reporting the last recorded speed > and duplex settings unless the device has never been > activated. In that case or when the hypervisor gives > invalid values, continue to report unknown speed or > duplex to ethtool. > > Signed-off-by: Thomas Falcon Like Andrew, I have my conerns about this. If the firmware is unavailable, the link is effectively down. So you should report link down and unknown link parameters. Bonding and LACP should do the right thing when the firwmare is reachable again after the migration and the link goes back up. If bonding/LACP isn't doing that, then the bug is there.
Re: [RFC PATCH] Replaces long number representation by BIT() macro
On Tue, Jul 02, 2019 at 11:16:35AM -0500, Segher Boessenkool wrote: > On Wed, Jul 03, 2019 at 01:19:34AM +1000, Michael Ellerman wrote: > > What we could do is switch to the `UL` macro from include/linux/const.h, > > rather than using our own ASM_CONST. > > You need gas 2.28 or later for that though. Oh, but apparently I cannot read. That macro should work fine. Segher
Re: [RFC PATCH] Replaces long number representation by BIT() macro
On Wed, Jul 03, 2019 at 01:19:34AM +1000, Michael Ellerman wrote: > What we could do is switch to the `UL` macro from include/linux/const.h, > rather than using our own ASM_CONST. You need gas 2.28 or later for that though. https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=86b80085 https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=e140100a What is the minimum required (for powerpc) now? Segher
Re: ["RFC PATCH" 1/2] powerpc/mm: Fix node look up with numa=off boot
"Aneesh Kumar K.V" writes: >> Just checking: do people still need numa=off? Seems like it's a >> maintenance burden :-) >> > > That is used in kdump kernel. I see, thanks.
[PATCH] powerpc: Enable CONFIG_IPV6 in ppc64_defconfig
From: Satheesh Rajendran Enable CONFIG_IPV6 in ppc64_defconfig to enable certain network functionalities required for tests. Signed-off-by: Michael Ellerman Signed-off-by: Satheesh Rajendran --- arch/powerpc/configs/ppc64_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig index 91fdb619b484..93fd9792d030 100644 --- a/arch/powerpc/configs/ppc64_defconfig +++ b/arch/powerpc/configs/ppc64_defconfig @@ -89,7 +89,7 @@ CONFIG_SYN_COOKIES=y CONFIG_INET_AH=m CONFIG_INET_ESP=m CONFIG_INET_IPCOMP=m -# CONFIG_IPV6 is not set +CONFIG_IPV6=y CONFIG_NETFILTER=y # CONFIG_NETFILTER_ADVANCED is not set CONFIG_BRIDGE=m -- 2.21.0
Re: [RFC PATCH] Replaces long number representation by BIT() macro
Hi Leonardo, Leonardo Bras writes: > The main reason of this change is to make these bitmasks more readable. > > The macro ASM_CONST() just appends an UL to it's parameter, so it can be > easily replaced by BIT_MASK, that already uses a UL representation. > > ASM_CONST() in this file may behave different if __ASSEMBLY__ is defined, > as it is used on .S files, just leaving the parameter as is. Thanks for the patch, but I don't consider this an improvement in readability. At boot we print the firmware features, eg: firmware_features = 0x0001c45ffc5f And it's much easier to match that up to the full constants, than to the bit numbers. Similarly in memory or register dumps. What we could do is switch to the `UL` macro from include/linux/const.h, rather than using our own ASM_CONST. cheers > diff --git a/arch/powerpc/include/asm/firmware.h > b/arch/powerpc/include/asm/firmware.h > index 00bc42d95679..7a5b0cc0bc85 100644 > --- a/arch/powerpc/include/asm/firmware.h > +++ b/arch/powerpc/include/asm/firmware.h > @@ -14,46 +14,45 @@ > > #ifdef __KERNEL__ > > -#include > - > +#include > /* firmware feature bitmask values */ > > -#define FW_FEATURE_PFT ASM_CONST(0x0001) > -#define FW_FEATURE_TCE ASM_CONST(0x0002) > -#define FW_FEATURE_SPRG0 ASM_CONST(0x0004) > -#define FW_FEATURE_DABR ASM_CONST(0x0008) > -#define FW_FEATURE_COPY ASM_CONST(0x0010) > -#define FW_FEATURE_ASR ASM_CONST(0x0020) > -#define FW_FEATURE_DEBUG ASM_CONST(0x0040) > -#define FW_FEATURE_TERM ASM_CONST(0x0080) > -#define FW_FEATURE_PERF ASM_CONST(0x0100) > -#define FW_FEATURE_DUMP ASM_CONST(0x0200) > -#define FW_FEATURE_INTERRUPT ASM_CONST(0x0400) > -#define FW_FEATURE_MIGRATE ASM_CONST(0x0800) > -#define FW_FEATURE_PERFMON ASM_CONST(0x1000) > -#define FW_FEATURE_CRQ ASM_CONST(0x2000) > -#define FW_FEATURE_VIO ASM_CONST(0x4000) > -#define FW_FEATURE_RDMA ASM_CONST(0x8000) > -#define FW_FEATURE_LLAN ASM_CONST(0x0001) > -#define FW_FEATURE_BULK_REMOVE ASM_CONST(0x0002) > -#define FW_FEATURE_XDABR ASM_CONST(0x0004) > -#define FW_FEATURE_MULTITCE ASM_CONST(0x0008) > -#define FW_FEATURE_SPLPARASM_CONST(0x0010) > -#define FW_FEATURE_LPAR ASM_CONST(0x0040) > -#define FW_FEATURE_PS3_LV1 ASM_CONST(0x0080) > -#define FW_FEATURE_HPT_RESIZEASM_CONST(0x0100) > -#define FW_FEATURE_CMO ASM_CONST(0x0200) > -#define FW_FEATURE_VPHN ASM_CONST(0x0400) > -#define FW_FEATURE_XCMO ASM_CONST(0x0800) > -#define FW_FEATURE_OPAL ASM_CONST(0x1000) > -#define FW_FEATURE_SET_MODE ASM_CONST(0x4000) > -#define FW_FEATURE_BEST_ENERGY ASM_CONST(0x8000) > -#define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0001) > -#define FW_FEATURE_PRRN ASM_CONST(0x0002) > -#define FW_FEATURE_DRMEM_V2 ASM_CONST(0x0004) > -#define FW_FEATURE_DRC_INFO ASM_CONST(0x0008) > -#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0010) > -#define FW_FEATURE_PAPR_SCM ASM_CONST(0x0020) > +#define FW_FEATURE_PFT BIT(0) > +#define FW_FEATURE_TCE BIT(1) > +#define FW_FEATURE_SPRG0 BIT(2) > +#define FW_FEATURE_DABR BIT(3) > +#define FW_FEATURE_COPY BIT(4) > +#define FW_FEATURE_ASR BIT(5) > +#define FW_FEATURE_DEBUG BIT(6) > +#define FW_FEATURE_TERM BIT(7) > +#define FW_FEATURE_PERF BIT(8) > +#define FW_FEATURE_DUMP BIT(9) > +#define FW_FEATURE_INTERRUPT BIT(10) > +#define FW_FEATURE_MIGRATE BIT(11) > +#define FW_FEATURE_PERFMON BIT(12) > +#define FW_FEATURE_CRQ BIT(13) > +#define FW_FEATURE_VIO BIT(14) > +#define FW_FEATURE_RDMA BIT(15) > +#define FW_FEATURE_LLAN BIT(16) > +#define FW_FEATURE_BULK_REMOVE BIT(17) > +#define FW_FEATURE_XDABR BIT(18) > +#define FW_FEATURE_MULTITCE BIT(19) > +#define FW_FEATURE_SPLPARBIT(20) > +#define FW_FEATURE_LPAR BIT(22) > +#define FW_FEATURE_PS3_LV1 BIT(23) > +#define FW_FEATURE_HPT_RESIZEBIT(24) > +#define FW_FEATURE_CMO BIT(25) > +#define FW_FEATURE_VPHN BIT(26) > +#define FW_FEATURE_XCMO BIT(27) > +#define FW_FEATURE_OPAL BIT(28) > +#define FW_FEATURE_SET_MODE BIT(30) > +#define FW_FEATURE_BEST_ENERGY BIT(31) > +#define FW_FEATURE_TYPE1_AFFINITY BIT(32) > +#define
Re: [v2 03/12] powerpc/mce: Add MCE notification chain
On Tue, Jul 02, 2019 at 10:49:23AM +0530, Santosh Sivaraj wrote: +static BLOCKING_NOTIFIER_HEAD(mce_notifier_list); Mahesh suggested using an atomic notifier chain instead of blocking, since we are in an interrupt. -- Reza Arbab
[PATCH -next] powerpc/powernv: Make some sysbols static
Fix sparse warnings: arch/powerpc/platforms/powernv/opal-psr.c:20:1: warning: symbol 'psr_mutex' was not declared. Should it be static? arch/powerpc/platforms/powernv/opal-psr.c:27:3: warning: symbol 'psr_attrs' was not declared. Should it be static? arch/powerpc/platforms/powernv/opal-powercap.c:20:1: warning: symbol 'powercap_mutex' was not declared. Should it be static? arch/powerpc/platforms/powernv/opal-sensor-groups.c:20:1: warning: symbol 'sg_mutex' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: YueHaibing --- arch/powerpc/platforms/powernv/opal-powercap.c | 2 +- arch/powerpc/platforms/powernv/opal-psr.c | 4 ++-- arch/powerpc/platforms/powernv/opal-sensor-groups.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-powercap.c b/arch/powerpc/platforms/powernv/opal-powercap.c index dc599e7..c16d44f 100644 --- a/arch/powerpc/platforms/powernv/opal-powercap.c +++ b/arch/powerpc/platforms/powernv/opal-powercap.c @@ -13,7 +13,7 @@ #include -DEFINE_MUTEX(powercap_mutex); +static DEFINE_MUTEX(powercap_mutex); static struct kobject *powercap_kobj; diff --git a/arch/powerpc/platforms/powernv/opal-psr.c b/arch/powerpc/platforms/powernv/opal-psr.c index b6ccb30..69d7e75 100644 --- a/arch/powerpc/platforms/powernv/opal-psr.c +++ b/arch/powerpc/platforms/powernv/opal-psr.c @@ -13,11 +13,11 @@ #include -DEFINE_MUTEX(psr_mutex); +static DEFINE_MUTEX(psr_mutex); static struct kobject *psr_kobj; -struct psr_attr { +static struct psr_attr { u32 handle; struct kobj_attribute attr; } *psr_attrs; diff --git a/arch/powerpc/platforms/powernv/opal-sensor-groups.c b/arch/powerpc/platforms/powernv/opal-sensor-groups.c index 31f13c1..f8ae1fb 100644 --- a/arch/powerpc/platforms/powernv/opal-sensor-groups.c +++ b/arch/powerpc/platforms/powernv/opal-sensor-groups.c @@ -13,7 +13,7 @@ #include -DEFINE_MUTEX(sg_mutex); +static DEFINE_MUTEX(sg_mutex); static struct kobject *sg_kobj; -- 2.7.4
[PATCH] powerpc/setup: Adjust six seq_printf() calls in show_cpuinfo()
From: Markus Elfring Date: Tue, 2 Jul 2019 14:41:42 +0200 A bit of information should be put into a sequence. Thus improve the execution speed for this data output by better usage of corresponding functions. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- arch/powerpc/kernel/setup-common.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 1f8db666468d..a381723b11bd 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -239,18 +239,17 @@ static int show_cpuinfo(struct seq_file *m, void *v) maj = (pvr >> 8) & 0xFF; min = pvr & 0xFF; - seq_printf(m, "processor\t: %lu\n", cpu_id); - seq_printf(m, "cpu\t\t: "); + seq_printf(m, "processor\t: %lu\ncpu\t\t: ", cpu_id); if (cur_cpu_spec->pvr_mask && cur_cpu_spec->cpu_name) - seq_printf(m, "%s", cur_cpu_spec->cpu_name); + seq_puts(m, cur_cpu_spec->cpu_name); else seq_printf(m, "unknown (%08x)", pvr); if (cpu_has_feature(CPU_FTR_ALTIVEC)) - seq_printf(m, ", altivec supported"); + seq_puts(m, ", altivec supported"); - seq_printf(m, "\n"); + seq_putc(m, '\n'); #ifdef CONFIG_TAU if (cpu_has_feature(CPU_FTR_TAU)) { @@ -332,7 +331,7 @@ static int show_cpuinfo(struct seq_file *m, void *v) seq_printf(m, "bogomips\t: %lu.%02lu\n", loops_per_jiffy / (50 / HZ), (loops_per_jiffy / (5000 / HZ)) % 100); - seq_printf(m, "\n"); + seq_putc(m, '\n'); /* If this is the last cpu, print the summary */ if (cpumask_next(cpu_id, cpu_online_mask) >= nr_cpu_ids) -- 2.22.0
[PATCH] powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state.
From: Athira Rajeev commit 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") reimplemented book3S code to pltform/powernv/idle.c. But when doing so missed to add the per-thread LDBAR update in the core_woken path of the power9_idle_stop(). Patch fixes the same. Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") Signed-off-by: Athira Rajeev Signed-off-by: Madhavan Srinivasan --- arch/powerpc/platforms/powernv/idle.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 2f4479b94ac3..fd14a6237954 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -758,7 +758,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on) mtspr(SPRN_PTCR,sprs.ptcr); mtspr(SPRN_RPR, sprs.rpr); mtspr(SPRN_TSCR,sprs.tscr); - mtspr(SPRN_LDBAR, sprs.ldbar); if (pls >= pnv_first_tb_loss_level) { /* TB loss */ @@ -790,6 +789,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on) mtspr(SPRN_MMCR0, sprs.mmcr0); mtspr(SPRN_MMCR1, sprs.mmcr1); mtspr(SPRN_MMCR2, sprs.mmcr2); + mtspr(SPRN_LDBAR, sprs.ldbar); mtspr(SPRN_SPRG3, local_paca->sprg_vdso); -- 2.20.1
Re: Re: [PATCH 1/3] arm64: mm: Add p?d_large() definitions
On Tue, Jul 02, 2019 at 01:07:11PM +1000, Nicholas Piggin wrote: > Will Deacon's on July 1, 2019 8:15 pm: > > On Mon, Jul 01, 2019 at 11:03:51AM +0100, Steven Price wrote: > >> On 01/07/2019 10:27, Will Deacon wrote: > >> > On Sun, Jun 23, 2019 at 07:44:44PM +1000, Nicholas Piggin wrote: > >> >> walk_page_range() is going to be allowed to walk page tables other than > >> >> those of user space. For this it needs to know when it has reached a > >> >> 'leaf' entry in the page tables. This information will be provided by > >> >> the > >> >> p?d_large() functions/macros. > >> > > >> > I can't remember whether or not I asked this before, but why not call > >> > this macro p?d_leaf() if that's what it's identifying? "Large" and "huge" > >> > are usually synonymous, so I find this naming needlessly confusing based > >> > on this patch in isolation. > > Those page table macro names are horrible. Large, huge, leaf, wtf? > They could do with a sensible renaming. But this series just follows > naming that's alreay there on x86. I realise that, and I wasn't meaning to have a go at you. Just wanted to make my opinion clear by having a moan :) Will
[PATCH] powerpc: Use nid as fallback for chip_id
One of the uses of chip_id is to find out all cores that are part of the same chip. However ibm,chip_id property is not present in device-tree of PowerVM Lpars. Hence lscpu output shows one core per socket and multiple cores. Before the patch. # lscpu Architecture:ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 16 NUMA node(s):2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache:512K L3 cache:10240K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id -1 Signed-off-by: Srikar Dronamraju --- arch/powerpc/kernel/prom.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 7159e791a70d..0b8918b43580 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -867,18 +867,24 @@ EXPORT_SYMBOL(of_get_ibm_chip_id); * @cpu: The logical cpu number. * * Return the value of the ibm,chip-id property corresponding to the given - * logical cpu number. If the chip-id can not be found, returns -1. + * logical cpu number. If the chip-id can not be found, return nid. + * */ int cpu_to_chip_id(int cpu) { struct device_node *np; + int chip_id = -1; np = of_get_cpu_node(cpu, NULL); if (!np) return -1; + chip_id = of_get_ibm_chip_id(np); + if (chip_id == -1) + chip_id = of_node_to_nid(np); + of_node_put(np); - return of_get_ibm_chip_id(np); + return chip_id; } EXPORT_SYMBOL(cpu_to_chip_id); -- 2.18.1
Re: [v2 09/12] powerpc/mce: Enable MCE notifiers in external modules
On 7/2/19 11:47 AM, Nicholas Piggin wrote: > Santosh Sivaraj's on July 2, 2019 3:19 pm: >> From: Reza Arbab >> >> Signed-off-by: Reza Arbab >> --- >> arch/powerpc/kernel/exceptions-64s.S | 6 ++ >> arch/powerpc/kernel/mce.c| 2 ++ >> 2 files changed, 8 insertions(+) >> >> diff --git a/arch/powerpc/kernel/exceptions-64s.S >> b/arch/powerpc/kernel/exceptions-64s.S >> index c83e38a403fd..311f1392a2ec 100644 >> --- a/arch/powerpc/kernel/exceptions-64s.S >> +++ b/arch/powerpc/kernel/exceptions-64s.S >> @@ -458,6 +458,12 @@ EXC_COMMON_BEGIN(machine_check_handle_early) >> bl machine_check_early >> std r3,RESULT(r1) /* Save result */ >> >> +/* Notifiers may be in a module, so enable virtual addressing. */ >> +mfmsr r11 >> +ori r11,r11,MSR_IR >> +ori r11,r11,MSR_DR >> +mtmsr r11 > > Can't do this, we could take a machine check somewhere the MMU is > not sane (in fact the guest early mce handling that was added recently > should not be enabling virtual mode either, which needs to be fixed). Looks like they need this to be able to run notifier chain which may fail in real mode. > > Thanks, > Nick >
Re: [PATCH v5 0/4] Additional fixes on Talitos driver
Hi Herbert, Le 24/06/2019 à 09:21, Christophe Leroy a écrit : This series is the last set of fixes for the Talitos driver. Do you plan to apply this series, or are you expecting anythink from myself ? Thanks Christophe We now get a fully clean boot on both SEC1 (SEC1.2 on mpc885) and SEC2 (SEC2.2 on mpc8321E) with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS: [3.385197] bus: 'platform': really_probe: probing driver talitos with device ff02.crypto [3.450982] random: fast init done [ 12.252548] alg: No test for authenc(hmac(md5),cbc(aes)) (authenc-hmac-md5-cbc-aes-talitos-hsna) [ 12.262226] alg: No test for authenc(hmac(md5),cbc(des3_ede)) (authenc-hmac-md5-cbc-3des-talitos-hsna) [ 43.310737] Bug in SEC1, padding ourself [ 45.603318] random: crng init done [ 54.612333] talitos ff02.crypto: fsl,sec1.2 algorithms registered in /proc/crypto [ 54.620232] driver: 'talitos': driver_bound: bound to device 'ff02.crypto' [1.193721] bus: 'platform': really_probe: probing driver talitos with device b003.crypto [1.229197] random: fast init done [2.714920] alg: No test for authenc(hmac(sha224),cbc(aes)) (authenc-hmac-sha224-cbc-aes-talitos) [2.724312] alg: No test for authenc(hmac(sha224),cbc(aes)) (authenc-hmac-sha224-cbc-aes-talitos-hsna) [4.482045] alg: No test for authenc(hmac(md5),cbc(aes)) (authenc-hmac-md5-cbc-aes-talitos) [4.490940] alg: No test for authenc(hmac(md5),cbc(aes)) (authenc-hmac-md5-cbc-aes-talitos-hsna) [4.500280] alg: No test for authenc(hmac(md5),cbc(des3_ede)) (authenc-hmac-md5-cbc-3des-talitos) [4.509727] alg: No test for authenc(hmac(md5),cbc(des3_ede)) (authenc-hmac-md5-cbc-3des-talitos-hsna) [6.631781] random: crng init done [ 11.521795] talitos b003.crypto: fsl,sec2.2 algorithms registered in /proc/crypto [ 11.529803] driver: 'talitos': driver_bound: bound to device 'b003.crypto' v2: dropped patch 1 which was irrelevant due to a rebase weirdness. Added Cc to stable on the 2 first patches. v3: - removed stable reference in patch 1 - reworded patch 1 to include name of patch 2 for the dependency. - mentionned this dependency in patch 2 as well. - corrected the Fixes: sha1 in patch 4 v4: - using scatterwalk_ffwd() instead of opencodying SG list forwarding. - Added a patch to fix sg_copy_to_buffer() when sg->offset() is greater than PAGE_SIZE, otherwise sg_copy_to_buffer() fails when the list has been forwarded with scatterwalk_ffwd(). - taken the patch "crypto: talitos - eliminate unneeded 'done' functions at build time" out of the series because it is independent. - added a helper to find the header field associated to a request in flush_channe() v5: - Replacing the while loop by a direct shift/mask operation, as suggested by Herbert in patch 1. Christophe Leroy (4): lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE crypto: talitos - move struct talitos_edesc into talitos.h crypto: talitos - fix hash on SEC1. crypto: talitos - drop icv_ool drivers/crypto/talitos.c | 102 +++ drivers/crypto/talitos.h | 28 + lib/scatterlist.c| 9 +++-- 3 files changed, 74 insertions(+), 65 deletions(-)
[PATCH v2] powerpc/imc: Dont create debugfs files for cpu-less nodes
Commit <684d984038aa> ('powerpc/powernv: Add debugfs interface for imc-mode and imc') added debugfs interface for the nest imc pmu devices to support changing of different ucode modes. Primarily adding this capability for debug. But when doing so, the code did not consider the case of cpu-less nodes. So when reading the _cmd_ or _mode_ file of a cpu-less node will create this crash. [ 1139.415461][ T5301] Faulting instruction address: 0xc00d0d58 [ 1139.415492][ T5301] Oops: Kernel access of bad area, sig: 11 [#1] [ 1139.415509][ T5301] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV [ 1139.415542][ T5301] Modules linked in: i2c_opal i2c_core ip_tables x_tables xfs sd_mod bnx2x mdio ahci libahci tg3 libphy libata firmware_class dm_mirror dm_region_hash dm_log dm_mod [ 1139.415595][ T5301] CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next- 20190627+ #19 [ 1139.415634][ T5301] NIP: c00d0d58 LR: c049aa18 CTR:c00d0d50 [ 1139.415675][ T5301] REGS: c00020194548f9e0 TRAP: 0300 Not tainted (5.2.0-rc6-next-20190627+) [ 1139.415705][ T5301] MSR: 90009033 CR:28022822 XER: [ 1139.415777][ T5301] CFAR: c049aa14 DAR: 0003fc08 DSISR:4000 IRQMASK: 0 [ 1139.415777][ T5301] GPR00: c049aa18 c00020194548fc70 c16f8b03fc08 [ 1139.415777][ T5301] GPR04: c00020194548fcd0 14884e7300011eaa [ 1139.415777][ T5301] GPR08: 7eea5a52 c00d0d50 [ 1139.415777][ T5301] GPR12: c00d0d50 c000201fff7f8c00 [ 1139.415777][ T5301] GPR16: 000d 7fffeb0c3368 [ 1139.415777][ T5301] GPR20: 0002 [ 1139.415777][ T5301] GPR24: 000200010ec9 [ 1139.415777][ T5301] GPR28: c00020194548fdf0 c00020049a584ef8 c00020049a584ea8 [ 1139.416116][ T5301] NIP [c00d0d58] imc_mem_get+0x8/0x20 [ 1139.416143][ T5301] LR [c049aa18] simple_attr_read+0x118/0x170 [ 1139.416158][ T5301] Call Trace: [ 1139.416182][ T5301] [c00020194548fc70] [c049a970]simple_attr_read+0x70/0x170 (unreliable) [ 1139.416255][ T5301] [c00020194548fd10] [c054385c]debugfs_attr_read+0x6c/0xb0 [ 1139.416305][ T5301] [c00020194548fd60] [c0454c1c]__vfs_read+0x3c/0x70 [ 1139.416363][ T5301] [c00020194548fd80] [c0454d0c] vfs_read+0xbc/0x1a0 [ 1139.416392][ T5301] [c00020194548fdd0] [c045519c]ksys_read+0x7c/0x140 [ 1139.416434][ T5301] [c00020194548fe20] [c000b108]system_call+0x5c/0x70 [ 1139.416473][ T5301] Instruction dump: [ 1139.416511][ T5301] 4e800020 6000 7c0802a6 6000 7c801d28 3860 4e800020 6000 [ 1139.416572][ T5301] 6000 6000 7c0802a6 6000 <7d201c28> 3860 f924 4e800020 [ 1139.416636][ T5301] ---[ end trace c44d1fb4ace04784 ]--- [ 1139.520686][ T5301] [ 1140.520820][ T5301] Kernel panic - not syncing: Fatal exception Patch fixes the issue with a more robust check for vbase to NULL. Before patch, ls output for the debugfs imc directory # ls /sys/kernel/debug/powerpc/imc/ imc_cmd_0imc_cmd_251 imc_cmd_253 imc_cmd_255 imc_mode_0imc_mode_251 imc_mode_253 imc_mode_255 imc_cmd_250 imc_cmd_252 imc_cmd_254 imc_cmd_8imc_mode_250 imc_mode_252 imc_mode_254 imc_mode_8 After patch, ls output for the debugfs imc directory # ls /sys/kernel/debug/powerpc/imc/ imc_cmd_0 imc_cmd_8 imc_mode_0 imc_mode_8 Fixes: 684d984038aa ('powerpc/powernv: Add debugfs interface for imc-mode and imc') Reported-by: Qian Cai Suggested-by: Michael Ellerman Signed-off-by: Madhavan Srinivasan --- Changelog v1: - Modified the cpumask check. arch/powerpc/platforms/powernv/opal-imc.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c index 186109bdd41b..e04b20625cb9 100644 --- a/arch/powerpc/platforms/powernv/opal-imc.c +++ b/arch/powerpc/platforms/powernv/opal-imc.c @@ -53,9 +53,9 @@ static void export_imc_mode_and_cmd(struct device_node *node, struct imc_pmu *pmu_ptr) { static u64 loc, *imc_mode_addr, *imc_cmd_addr; - int chip = 0, nid; char mode[16], cmd[16]; u32 cb_offset; + struct imc_mem_info *ptr = pmu_ptr->mem_info; imc_debugfs_parent = debugfs_create_dir("imc", powerpc_debugfs_root); @@ -69,20 +69,20 @@ static void export_imc_mode_and_cmd(struct device_node *node, if (of_property_read_u32(node, "cb_offset", _offset)) cb_offset = IMC_CNTL_BLK_OFFSET; - for_each_node(nid) { - loc = (u64)(pmu_ptr->mem_info[chip].vbase) + cb_offset; + while (ptr->vbase != NULL) { + loc =
[PATCH] powerpc/64s/exception: Remove unused SOFTEN_VALUE_0x980
Remove SOFTEN_VALUE_0x980, it's been unused since commit dabe859ec636 ("powerpc: Give hypervisor decrementer interrupts their own handler") (Sep 2012). Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/exception-64s.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index b590765f6e45..b4f8b745ba01 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -583,7 +583,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) /* This associate vector numbers with bits in paca->irq_happened */ #define SOFTEN_VALUE_0x500 PACA_IRQ_EE #define SOFTEN_VALUE_0x900 PACA_IRQ_DEC -#define SOFTEN_VALUE_0x980 PACA_IRQ_DEC #define SOFTEN_VALUE_0xa00 PACA_IRQ_DBELL #define SOFTEN_VALUE_0xe80 PACA_IRQ_DBELL #define SOFTEN_VALUE_0xe60 PACA_IRQ_HMI -- 2.20.1
Re: [RFC 09/11] pci/hotplug/pnv-php: Relax check when disabling slot
On 19/6/19 11:28 pm, Frederic Barrat wrote: The driver only allows to disable a slot in the POPULATED state. However, if an error occurs while enabling the slot, say because the link couldn't be trained, then the POPULATED state may not be reached, yet the power state of the slot is on. So allow to disable a slot in the REGISTERED state. Removing the devices will do nothing since it's not populated, and we'll set the power state of the slot back to off. Signed-off-by: Frederic Barrat Reviewed-by: Andrew Donnellan --- drivers/pci/hotplug/pnv_php.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c index f9c624334ef7..74b62a8e11e7 100644 --- a/drivers/pci/hotplug/pnv_php.c +++ b/drivers/pci/hotplug/pnv_php.c @@ -523,7 +523,13 @@ static int pnv_php_disable_slot(struct hotplug_slot *slot) struct pnv_php_slot *php_slot = to_pnv_php_slot(slot); int ret; - if (php_slot->state != PNV_PHP_STATE_POPULATED) + /* +* Allow to disable a slot already in the registered state to +* cover cases where the slot couldn't be enabled and never +* reached the populated state +*/ + if (php_slot->state != PNV_PHP_STATE_POPULATED && + php_slot->state != PNV_PHP_STATE_REGISTERED) return 0; /* Remove all devices behind the slot */ -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [RFC 11/11] ocxl: Add PCI hotplug dependency to Kconfig
On 19/6/19 11:28 pm, Frederic Barrat wrote: The PCI hotplug framework is used to update the devices when a new image is written to the FPGA. Signed-off-by: Frederic Barrat Acked-by: Andrew Donnellan --- drivers/misc/ocxl/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig index 7fb6d39d4c5a..13a5d9f30369 100644 --- a/drivers/misc/ocxl/Kconfig +++ b/drivers/misc/ocxl/Kconfig @@ -12,6 +12,7 @@ config OCXL tristate "OpenCAPI coherent accelerator support" depends on PPC_POWERNV && PCI && EEH select OCXL_BASE + select HOTPLUG_PCI_POWERNV default m help Select this option to enable the ocxl driver for Open -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH] powerpc/configs: Disable /dev/port in skiroot defconfig
Michael Ellerman writes: > Daniel Axtens writes: >> While reviewing lockdown patches, I discovered that we still enable >> /dev/port (CONFIG_DEVPORT) in skiroot. >> >> We don't need it. Deselect CONFIG_DEVPORT for skiroot. > > Why don't we need it? :) I should have explained this better :) /dev/port is used for old x86 style IO accesses. It's set up in drivers/char/mem.c, and is only created if arch_has_dev_port() returns true. Per arch/powerpc/include/asm/io.h, on PPC64 with PCI, this is only true if there's a legacy ISA bridge. Even if a system has a legacy ISA bridge installed, we have no business accessing it in skiroot. Regards, Daniel > > cheers > >> diff --git a/arch/powerpc/configs/skiroot_defconfig >> b/arch/powerpc/configs/skiroot_defconfig >> index 5ba131c30f6b..b2e8f37156eb 100644 >> --- a/arch/powerpc/configs/skiroot_defconfig >> +++ b/arch/powerpc/configs/skiroot_defconfig >> @@ -212,6 +212,7 @@ CONFIG_IPMI_WATCHDOG=y >> CONFIG_HW_RANDOM=y >> CONFIG_TCG_TPM=y >> CONFIG_TCG_TIS_I2C_NUVOTON=y >> +# CONFIG_DEVPORT is not set >> CONFIG_I2C=y >> # CONFIG_I2C_COMPAT is not set >> CONFIG_I2C_CHARDEV=y >> -- >> 2.20.1
Re: vmlinux.o(.text+0x40e): Section mismatch in reference from the variable start_here_multiplatform to the function
Le 02/07/2019 à 08:23, Christian Zigotzky a écrit : Hi All, I get the following error messages after compiling the RC7 of kernel 5.2: WARNING: vmlinux.o(.text+0x40e): Section mismatch in reference from the variable start_here_multiplatform to the function .init.text:.early_setup() The function start_here_multiplatform() references the function __init .early_setup(). This is often because start_here_multiplatform lacks a __init annotation or the annotation of .early_setup is wrong. Harmless warning. Fix at https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/powerpc/kernel/head_64.S?h=next-20190701=9c4e4c90ec24652921e31e9551fcaedc26eec86d Will be cherry-picked by stable once merged into 4.3 I guess. Christophe FATAL: modpost: Section mismatches detected. Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them. scripts/Makefile.modpost:97: recipe for target 'vmlinux.o' failed make[1]: *** [vmlinux.o] Error 1 Makefile:1052: recipe for target 'vmlinux' failed make: *** [vmlinux] Error 2 Please find attached the kernel config. Any hints? Thanks, Christian
[v2 04/12] powerpc/mce: Move machine_check_ue_event() call
From: Reza Arbab Move the call site of machine_check_ue_event() slightly later in the MCE codepath. No functional change intended--this is prep for a later patch to conditionally skip the call. Signed-off-by: Reza Arbab --- arch/powerpc/kernel/mce.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 24d350a934e4..0ab171b41ede 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -156,7 +156,6 @@ void save_mce_event(struct pt_regs *regs, long handled, if (phys_addr != ULONG_MAX) { mce->u.ue_error.physical_address_provided = true; mce->u.ue_error.physical_address = phys_addr; - machine_check_ue_event(mce); } } return; @@ -656,4 +655,8 @@ void machine_check_notify(struct pt_regs *regs) return; blocking_notifier_call_chain(_notifier_list, 0, ); + + if (evt.error_type == MCE_ERROR_TYPE_UE && + evt.u.ue_error.physical_address_provided) + machine_check_ue_event(); } -- 2.20.1
[v2 03/12] powerpc/mce: Add MCE notification chain
From: Reza Arbab Signed-off-by: Reza Arbab --- arch/powerpc/include/asm/asm-prototypes.h | 1 + arch/powerpc/include/asm/mce.h| 4 arch/powerpc/kernel/exceptions-64s.S | 4 arch/powerpc/kernel/mce.c | 22 ++ 4 files changed, 31 insertions(+) diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index ec1c97a8e8cb..f66f26ef3ce0 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -72,6 +72,7 @@ void machine_check_exception(struct pt_regs *regs); void emulation_assist_interrupt(struct pt_regs *regs); long do_slb_fault(struct pt_regs *regs, unsigned long ea); void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err); +void machine_check_notify(struct pt_regs *regs); /* signals, syscalls and interrupts */ long sys_swapcontext(struct ucontext __user *old_ctx, diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index 94888a7025b3..948bef579086 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -214,4 +214,8 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr, #ifdef CONFIG_PPC_BOOK3S_64 void flush_and_reload_slb(void); #endif /* CONFIG_PPC_BOOK3S_64 */ + +int mce_register_notifier(struct notifier_block *nb); +int mce_unregister_notifier(struct notifier_block *nb); + #endif /* __ASM_PPC64_MCE_H__ */ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 6b86055e5251..2e56014fca21 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -457,6 +457,10 @@ EXC_COMMON_BEGIN(machine_check_handle_early) addir3,r1,STACK_FRAME_OVERHEAD bl machine_check_early std r3,RESULT(r1) /* Save result */ + + addir3,r1,STACK_FRAME_OVERHEAD + bl machine_check_notify + ld r12,_MSR(r1) BEGIN_FTR_SECTION b 4f diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index e78c4f18ea0a..24d350a934e4 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -42,6 +42,18 @@ static struct irq_work mce_event_process_work = { DECLARE_WORK(mce_ue_event_work, machine_process_ue_event); +static BLOCKING_NOTIFIER_HEAD(mce_notifier_list); + +int mce_register_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(_notifier_list, nb); +} + +int mce_unregister_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(_notifier_list, nb); +} + static void mce_set_error_info(struct machine_check_event *mce, struct mce_error_info *mce_err) { @@ -635,3 +647,13 @@ long hmi_exception_realmode(struct pt_regs *regs) return 1; } + +void machine_check_notify(struct pt_regs *regs) +{ + struct machine_check_event evt; + + if (!get_mce_event(, MCE_EVENT_DONTRELEASE)) + return; + + blocking_notifier_call_chain(_notifier_list, 0, ); +} -- 2.20.1
[v2 02/12] powerpc/mce: Bug fixes for MCE handling in kernel space
From: Balbir Singh The code currently assumes PAGE_SHIFT as the shift value of the pfn, this works correctly (mostly) for user space pages, but the correct thing to do is 1. Extract the shift value returned via the pte-walk API's 2. Use the shift value to access the instruction address. Note, the final physical address still use PAGE_SHIFT for computation. handle_ierror() is not modified and handle_derror() is modified just for extracting the correct instruction address. This is largely due to __find_linux_pte() returning pfn's shifted by pdshift. The code is much more generic and can handle shift values returned. Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh [ar...@linux.ibm.com: Fixup pseries_do_memory_failure()] --- arch/powerpc/include/asm/mce.h | 3 ++- arch/powerpc/kernel/mce_power.c | 26 -- arch/powerpc/platforms/pseries/ras.c | 6 -- 3 files changed, 22 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index a4c6a74ad2fb..94888a7025b3 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -209,7 +209,8 @@ extern void release_mce_event(void); extern void machine_check_queue_event(void); extern void machine_check_print_event_info(struct machine_check_event *evt, bool user_mode, bool in_guest); -unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr); +unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr, + unsigned int *shift); #ifdef CONFIG_PPC_BOOK3S_64 void flush_and_reload_slb(void); #endif /* CONFIG_PPC_BOOK3S_64 */ diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index e39536aad30d..04666c0b40a8 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -23,7 +23,8 @@ * Convert an address related to an mm to a PFN. NOTE: we are in real * mode, we could potentially race with page table updates. */ -unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) +unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr, + unsigned int *shift) { pte_t *ptep; unsigned long flags; @@ -36,13 +37,15 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) local_irq_save(flags); if (mm == current->mm) - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); + ptep = find_current_mm_pte(mm->pgd, addr, NULL, shift); else - ptep = find_init_mm_pte(addr, NULL); + ptep = find_init_mm_pte(addr, shift); local_irq_restore(flags); if (!ptep || pte_special(*ptep)) return ULONG_MAX; - return pte_pfn(*ptep); + if (!*shift) + *shift = PAGE_SHIFT; + return (pte_val(*ptep) & PTE_RPN_MASK) >> *shift; } /* flush SLBs and reload */ @@ -358,15 +361,16 @@ static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, unsigned long pfn, instr_addr; struct instruction_op op; struct pt_regs tmp = *regs; + unsigned int shift; - pfn = addr_to_pfn(regs, regs->nip); + pfn = addr_to_pfn(regs, regs->nip, ); if (pfn != ULONG_MAX) { - instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK); + instr_addr = (pfn << shift) + (regs->nip & ((1 << shift) - 1)); instr = *(unsigned int *)(instr_addr); if (!analyse_instr(, , instr)) { - pfn = addr_to_pfn(regs, op.ea); + pfn = addr_to_pfn(regs, op.ea, ); *addr = op.ea; - *phys_addr = (pfn << PAGE_SHIFT); + *phys_addr = (pfn << shift); return 0; } /* @@ -442,12 +446,14 @@ static int mce_handle_ierror(struct pt_regs *regs, if (mce_err->sync_error && table[i].error_type == MCE_ERROR_TYPE_UE) { unsigned long pfn; + unsigned int shift; if (get_paca()->in_mce < MAX_MCE_DEPTH) { - pfn = addr_to_pfn(regs, regs->nip); + pfn = addr_to_pfn(regs, regs->nip, + ); if (pfn != ULONG_MAX) { *phys_addr = - (pfn << PAGE_SHIFT); + (pfn << shift); } } } diff --git
[v2 00/12] powerpc: implement machine check safe memcpy
During a memcpy from a pmem device, if a machine check exception is generated we end up in a panic. In case of fsdax read, this should only result in a -EIO. Avoid MCE by implementing memcpy_mcsafe. Before this patch series: ``` bash-4.4# mount -o dax /dev/pmem0 /mnt/pmem/ [ 7621.714094] Disabling lock debugging due to kernel taint [ 7621.714099] MCE: CPU0: machine check (Severe) Host UE Load/Store [Not recovered] [ 7621.714104] MCE: CPU0: NIP: [c0088978] memcpy_power7+0x418/0x7e0 [ 7621.714107] MCE: CPU0: Hardware error [ 7621.714112] opal: Hardware platform error: Unrecoverable Machine Check exception [ 7621.714118] CPU: 0 PID: 1368 Comm: mount Tainted: G M 5.2.0-rc5-00239-g241e39004581 #50 [ 7621.714123] NIP: c0088978 LR: c08e16f8 CTR: 01de [ 7621.714129] REGS: c000fffbfd70 TRAP: 0200 Tainted: G M (5.2.0-rc5-00239-g241e39004581) [ 7621.714131] MSR: 92209033 CR: 24428840 XER: 0004 [ 7621.714160] CFAR: c00889a8 DAR: deadbeefdeadbeef DSISR: 8000 IRQMASK: 0 [ 7621.714171] GPR00: 0e00 c000f0b8b1e0 c12cf100 c000ed8e1100 [ 7621.714186] GPR04: c2001100 0001 0200 03fff1272000 [ 7621.714201] GPR08: 8000 0010 0020 0030 [ 7621.714216] GPR12: 0040 7fffb8c6d390 0050 0060 [ 7621.714232] GPR16: 0070 0001 c000f0b8b960 [ 7621.714247] GPR20: 0001 c000f0b8b940 0001 0001 [ 7621.714262] GPR24: c1382560 c00c003b6380 c00c003b6380 0001 [ 7621.714277] GPR28: 0001 c200 0001 [ 7621.714294] NIP [c0088978] memcpy_power7+0x418/0x7e0 [ 7621.714298] LR [c08e16f8] pmem_do_bvec+0xf8/0x430 ... ... ``` After this patch series: ``` bash-4.4# mount -o dax /dev/pmem0 /mnt/pmem/ [25302.883978] Buffer I/O error on dev pmem0, logical block 0, async page read [25303.020816] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [25303.021236] EXT4-fs (pmem0): Can't read superblock on 2nd try [25303.152515] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [25303.284031] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [25304.084100] UDF-fs: bad mount option "dax" or missing value mount: /mnt/pmem: wrong fs type, bad option, bad superblock on /dev/pmem0, missing codepage or helper program, or other error. ``` MCE is injected on a pmem address using mambo. The last patch which restores r13 is only for testing on mambo, where r13 is not restored upon hittin vector 200. The memcpy code can be optimised by adding VMX optimizations and GAS macros can be used to enable code reusablity, which I will send as another series. -- Balbir Singh (2): powerpc/mce: Bug fixes for MCE handling in kernel space powerpc/memcpy: Add memcpy_mcsafe for pmem Reza Arbab (8): powerpc/mce: Make machine_check_ue_event() static powerpc/mce: Add MCE notification chain powerpc/mce: Move machine_check_ue_event() call powerpc/mce: Allow notifier callback to handle MCE powerpc/mce: Add fixup address to UE events powerpc/mce: Handle memcpy_mcsafe() powerpc/mce: Enable MCE notifiers in external modules powerpc/64s: Save r13 in machine_check_common_early Santosh Sivaraj (2): powerpc/memcpy_mcsafe: return remaining bytes powerpc: add machine check safe copy_to_user arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/asm-prototypes.h | 1 + arch/powerpc/include/asm/mce.h| 13 +- arch/powerpc/include/asm/string.h | 2 + arch/powerpc/include/asm/uaccess.h| 12 ++ arch/powerpc/kernel/exceptions-64s.S | 14 ++ arch/powerpc/kernel/mce.c | 102 +- arch/powerpc/kernel/mce_power.c | 26 ++- arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/memcpy_mcsafe_64.S | 226 ++ arch/powerpc/platforms/pseries/ras.c | 6 +- 11 files changed, 386 insertions(+), 19 deletions(-) create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S -- 2.20.1
Re: [RFC 02/11] powerpc/powernv/ioda: Protect PE list
On 19/6/19 11:28 pm, Frederic Barrat wrote: Protect the PHB's list of PE. Probably not needed as long as it was populated during PHB creation, but it feels right and will become required once we can add/remove opencapi devices on hotplug. Signed-off-by: Frederic Barrat Reviewed-by: Andrew Donnellan --- arch/powerpc/platforms/powernv/pci-ioda.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 3082912e2600..2c063b05bb64 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1078,8 +1078,9 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev) } /* Put PE to the list */ + mutex_lock(>ioda.pe_list_mutex); list_add_tail(>list, >ioda.pe_list); - + mutex_unlock(>ioda.pe_list_mutex); return pe; } @@ -3501,7 +3502,10 @@ static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe) struct pnv_phb *phb = pe->phb; struct pnv_ioda_pe *slave, *tmp; + mutex_lock(>ioda.pe_list_mutex); list_del(>list); + mutex_unlock(>ioda.pe_list_mutex); + switch (phb->type) { case PNV_PHB_IODA1: pnv_pci_ioda1_release_pe_dma(pe); -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [v2 12/12] powerpc/64s: Save r13 in machine_check_common_early
Santosh Sivaraj's on July 2, 2019 3:19 pm: > From: Reza Arbab > > Testing my memcpy_mcsafe() work in progress with an injected UE, I get > an error like this immediately after the function returns: > > BUG: Unable to handle kernel data access at 0x7fff84dec8f8 > Faulting instruction address: 0xc008009c00b0 > Oops: Kernel access of bad area, sig: 11 [#1] > LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV > Modules linked in: mce(O+) vmx_crypto crc32c_vpmsum > CPU: 0 PID: 1375 Comm: modprobe Tainted: G O 5.1.0-rc6 #267 > NIP: c008009c00b0 LR: c008009c00a8 CTR: c0095f90 > REGS: c000ee197790 TRAP: 0300 Tainted: G O (5.1.0-rc6) > MSR: 9280b033 CR: 88002826 > XER: 0004 > CFAR: c0095f8c DAR: 7fff84dec8f8 DSISR: 4000 IRQMASK: 0 > GPR00: 6c6c6568 c000ee197a20 c008009c8400 fff2 > GPR04: c008009c02e0 0006 c3c834c8 > GPR08: 0080 776a6681b7fb5100 c008009c01c8 > GPR12: c0095f90 7fff84debc00 4d071440 > GPR16: 00010601 c008009e c0c98dd8 c0c98d98 > GPR20: c3bba970 c008009c04d0 c008009c0618 c01e5820 > GPR24: 0100 0001 c3bba958 > GPR28: c008009c02e8 c008009c0318 c008009c02e0 > NIP [c008009c00b0] cause_ue+0xa8/0xe8 [mce] > LR [c008009c00a8] cause_ue+0xa0/0xe8 [mce] > > To fix, ensure that r13 is properly restored after an MCE. > > This commit is needed for testing this series, this is a possible simulator > bug. This introduces a bug, of course -- MCE occurring when r13 != PACA will corrupt r13. > --- > arch/powerpc/kernel/exceptions-64s.S | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 311f1392a2ec..932d8d05892c 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -265,6 +265,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE) > EXC_REAL_END(machine_check, 0x200, 0x100) > EXC_VIRT_NONE(0x4200, 0x100) > TRAMP_REAL_BEGIN(machine_check_common_early) > + SET_SCRATCH0(r13) /* save r13 */ > EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200) > /* >* Register contents: > -- > 2.20.1 > >
Re: [v2 09/12] powerpc/mce: Enable MCE notifiers in external modules
Santosh Sivaraj's on July 2, 2019 3:19 pm: > From: Reza Arbab > > Signed-off-by: Reza Arbab > --- > arch/powerpc/kernel/exceptions-64s.S | 6 ++ > arch/powerpc/kernel/mce.c| 2 ++ > 2 files changed, 8 insertions(+) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index c83e38a403fd..311f1392a2ec 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -458,6 +458,12 @@ EXC_COMMON_BEGIN(machine_check_handle_early) > bl machine_check_early > std r3,RESULT(r1) /* Save result */ > > + /* Notifiers may be in a module, so enable virtual addressing. */ > + mfmsr r11 > + ori r11,r11,MSR_IR > + ori r11,r11,MSR_DR > + mtmsr r11 Can't do this, we could take a machine check somewhere the MMU is not sane (in fact the guest early mce handling that was added recently should not be enabling virtual mode either, which needs to be fixed). Thanks, Nick