[PATCH 4/4] KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB
IH=6 may preserve hypervisor real-mode ERAT entries and is the recommended SLBIA hint for switching partitions. Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 9f0fdbae4b44..8cf1f69f442e 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -898,7 +898,7 @@ BEGIN_MMU_FTR_SECTION /* Radix host won't have populated the SLB, so no need to clear */ li r6, 0 slbmte r6, r6 - slbia + PPC_SLBIA(6) ptesync END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX) @@ -1506,7 +1506,7 @@ guest_exit_cont: /* r9 = vcpu, r12 = trap, r13 = paca */ /* Finally clear out the SLB */ li r0,0 slbmte r0,r0 - slbia + PPC_SLBIA(6) ptesync stw r5,VCPU_SLB_MAX(r9) @@ -3329,7 +3329,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) /* Clear hash and radix guest SLB, see guest_exit_short_path comment. */ slbmte r0, r0 - slbia + PPC_SLBIA(6) BEGIN_MMU_FTR_SECTION b 4f -- 2.23.0
[PATCH 3/4] KVM: PPC: Book3S HV: No need to clear radix host SLB before loading guest
Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 0e1f5bf168a1..9f0fdbae4b44 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -888,15 +888,19 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) cmpdi r3, 512 /* 1 microsecond */ blt hdec_soon - /* For hash guest, clear out and reload the SLB */ ld r6, VCPU_KVM(r4) lbz r0, KVM_RADIX(r6) cmpwi r0, 0 bne 9f + + /* For hash guest, clear out and reload the SLB */ +BEGIN_MMU_FTR_SECTION + /* Radix host won't have populated the SLB, so no need to clear */ li r6, 0 slbmte r6, r6 slbia ptesync +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX) /* Load up guest SLB entries (N.B. slb_max will be 0 for radix) */ lwz r5,VCPU_SLB_MAX(r4) -- 2.23.0
[PATCH 2/4] KVM: PPC: Book3S HV: Fix radix guest SLB side channel
The slbmte instruction is legal in radix mode, including radix guest mode. This means radix guests can load the SLB with arbitrary data. KVM host does not clear the SLB when exiting a guest if it was a radix guest, which would allow a rogue radix guest to use the SLB as a side channel to communicate with other guests. Fix this by ensuring the SLB is cleared when coming out of a radix guest. Only the first 4 entries are a concern, because radix guests always run with LPCR[UPRT]=1, which limits the reach of slbmte. slbia is not used (except in a non-performance-critical path) because it can clear cached translations. Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 39 - 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index d5a9b57ec129..0e1f5bf168a1 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1157,6 +1157,20 @@ EXPORT_SYMBOL_GPL(__kvmhv_vcpu_entry_p9) mr r4, r3 b fast_guest_entry_c guest_exit_short_path: + /* +* Malicious or buggy radix guests may have inserted SLB entries +* (only 0..3 because radix always runs with UPRT=1), so these must +* be cleared here to avoid side-channels. slbmte is used rather +* than slbia, as it won't clear cached translations. +*/ + li r0,0 + slbmte r0,r0 + li r4,1 + slbmte r0,r4 + li r4,2 + slbmte r0,r4 + li r4,3 + slbmte r0,r4 li r0, KVM_GUEST_MODE_NONE stb r0, HSTATE_IN_GUEST(r13) @@ -1469,7 +1483,7 @@ guest_exit_cont: /* r9 = vcpu, r12 = trap, r13 = paca */ lbz r0, KVM_RADIX(r5) li r5, 0 cmpwi r0, 0 - bne 3f /* for radix, save 0 entries */ + bne 0f /* for radix, save 0 entries */ lwz r0,VCPU_SLB_NR(r9) /* number of entries in SLB */ mtctr r0 li r6,0 @@ -1490,12 +1504,9 @@ guest_exit_cont: /* r9 = vcpu, r12 = trap, r13 = paca */ slbmte r0,r0 slbia ptesync -3: stw r5,VCPU_SLB_MAX(r9) + stw r5,VCPU_SLB_MAX(r9) /* load host SLB entries */ -BEGIN_MMU_FTR_SECTION - b 0f -END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX) ld r8,PACA_SLBSHADOWPTR(r13) .rept SLB_NUM_BOLTED @@ -1508,7 +1519,17 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX) slbmte r6,r5 1: addir8,r8,16 .endr -0: + b guest_bypass + +0: /* Sanitise radix guest SLB, see guest_exit_short_path comment. */ + li r0,0 + slbmte r0,r0 + li r4,1 + slbmte r0,r4 + li r4,2 + slbmte r0,r4 + li r4,3 + slbmte r0,r4 guest_bypass: stw r12, STACK_SLOT_TRAP(r1) @@ -3302,12 +3323,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) mtspr SPRN_CIABR, r0 mtspr SPRN_DAWRX0, r0 + /* Clear hash and radix guest SLB, see guest_exit_short_path comment. */ + slbmte r0, r0 + slbia + BEGIN_MMU_FTR_SECTION b 4f END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX) - slbmte r0, r0 - slbia ptesync ld r8, PACA_SLBSHADOWPTR(r13) .rept SLB_NUM_BOLTED -- 2.23.0
[PATCH 0/4] a few KVM patches
These patches are unrelated except touching some of the same code. The first two fix actual guest exploitable issues, the other two try to tidy up SLB management slightly. Thanks, Nick Nicholas Piggin (4): KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support KVM: PPC: Book3S HV: Fix radix guest SLB side channel KVM: PPC: Book3S HV: No need to clear radix host SLB before loading guest KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB arch/powerpc/include/asm/kvm_book3s_asm.h | 11 -- arch/powerpc/kernel/asm-offsets.c | 3 - arch/powerpc/kvm/book3s_hv.c | 56 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 108 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 129 ++ 5 files changed, 70 insertions(+), 237 deletions(-) -- 2.23.0
[PATCH 1/4] KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
This reverts much of commit c01015091a770 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts"), which was required to run HPT guests on RPT hosts on early POWER9 CPUs without support for "mixed mode", which meant the host could not run with MMU on while guests were running. This code has some corner case bugs, e.g., when the guest hits a machine check or HMI the primary locks up waiting for secondaries to switch LPCR to host, which they never do. This could all be fixed in software, but most CPUs in production have mixed mode support, and those that don't are believed to be all in installations that don't use this capability. So simplify things and remove support. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/kvm_book3s_asm.h | 11 --- arch/powerpc/kernel/asm-offsets.c | 3 - arch/powerpc/kvm/book3s_hv.c | 56 +++ arch/powerpc/kvm/book3s_hv_builtin.c | 108 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 80 5 files changed, 32 insertions(+), 226 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 078f4648ea27..b6d31bff5209 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -74,16 +74,6 @@ struct kvm_split_mode { u8 do_nap; u8 napped[MAX_SMT_THREADS]; struct kvmppc_vcore *vc[MAX_SUBCORES]; - /* Bits for changing lpcr on P9 */ - unsigned long lpcr_req; - unsigned long lpidr_req; - unsigned long host_lpcr; - u32 do_set; - u32 do_restore; - union { - u32 allphases; - u8 phase[4]; - } lpcr_sync; }; /* @@ -110,7 +100,6 @@ struct kvmppc_host_state { u8 hwthread_state; u8 host_ipi; u8 ptid;/* thread number within subcore when split */ - u8 tid; /* thread number within whole core */ u8 fake_suspend; struct kvm_vcpu *kvm_vcpu; struct kvmppc_vcore *kvm_vcore; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index b12d7c049bfe..489a22cf1a92 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -668,7 +668,6 @@ int main(void) HSTATE_FIELD(HSTATE_SAVED_XIRR, saved_xirr); HSTATE_FIELD(HSTATE_HOST_IPI, host_ipi); HSTATE_FIELD(HSTATE_PTID, ptid); - HSTATE_FIELD(HSTATE_TID, tid); HSTATE_FIELD(HSTATE_FAKE_SUSPEND, fake_suspend); HSTATE_FIELD(HSTATE_MMCR0, host_mmcr[0]); HSTATE_FIELD(HSTATE_MMCR1, host_mmcr[1]); @@ -698,8 +697,6 @@ int main(void) OFFSET(KVM_SPLIT_LDBAR, kvm_split_mode, ldbar); OFFSET(KVM_SPLIT_DO_NAP, kvm_split_mode, do_nap); OFFSET(KVM_SPLIT_NAPPED, kvm_split_mode, napped); - OFFSET(KVM_SPLIT_DO_SET, kvm_split_mode, do_set); - OFFSET(KVM_SPLIT_DO_RESTORE, kvm_split_mode, do_restore); #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #ifdef CONFIG_PPC_BOOK3S_64 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6f612d240392..2d8627dbd9f6 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -134,7 +134,7 @@ static inline bool nesting_enabled(struct kvm *kvm) } /* If set, the threads on each CPU core have to be in the same MMU mode */ -static bool no_mixing_hpt_and_radix; +static bool no_mixing_hpt_and_radix __read_mostly; static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -2862,11 +2862,6 @@ static bool can_dynamic_split(struct kvmppc_vcore *vc, struct core_info *cip) if (one_vm_per_core && vc->kvm != cip->vc[0]->kvm) return false; - /* Some POWER9 chips require all threads to be in the same MMU mode */ - if (no_mixing_hpt_and_radix && - kvm_is_radix(vc->kvm) != kvm_is_radix(cip->vc[0]->kvm)) - return false; - if (n_threads < cip->max_subcore_threads) n_threads = cip->max_subcore_threads; if (!subcore_config_ok(cip->n_subcores + 1, n_threads)) @@ -2905,6 +2900,9 @@ static void prepare_threads(struct kvmppc_vcore *vc) for_each_runnable_thread(i, vcpu, vc) { if (signal_pending(vcpu->arch.run_task)) vcpu->arch.ret = -EINTR; + else if (no_mixing_hpt_and_radix && +kvm_is_radix(vc->kvm) != radix_enabled()) + vcpu->arch.ret = -EINVAL; else if (vcpu->arch.vpa.update_pending || vcpu->arch.slb_shadow.update_pending || vcpu->arch.dtl.update_pending) @@ -3110,7 +3108,6 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) int controlled_threads; int trap; bool is_power8; - bool hpt_on_radix; /* *
Re: [PATCH 10/18] arch: powerpc: Stop building and using oprofile
On 14-01-21, 17:05, Viresh Kumar wrote: > The "oprofile" user-space tools don't use the kernel OPROFILE support > any more, and haven't in a long time. User-space has been converted to > the perf interfaces. > > This commits stops building oprofile for powerpc and removes any > reference to it from directories in arch/powerpc/ apart from > arch/powerpc/oprofile, which will be removed in the next commit (this is > broken into two commits as the size of the commit became very big, ~5k > lines). > > Note that the member "oprofile_cpu_type" in "struct cpu_spec" isn't > removed as it was also used by other parts of the code. > > Suggested-by: Christoph Hellwig > Suggested-by: Linus Torvalds > Signed-off-by: Viresh Kumar > --- > arch/powerpc/Kconfig | 1 - > arch/powerpc/Makefile | 2 - > arch/powerpc/configs/44x/akebono_defconfig| 1 - > arch/powerpc/configs/44x/currituck_defconfig | 1 - > arch/powerpc/configs/44x/fsp2_defconfig | 1 - > arch/powerpc/configs/44x/iss476-smp_defconfig | 1 - > arch/powerpc/configs/cell_defconfig | 1 - > arch/powerpc/configs/g5_defconfig | 1 - > arch/powerpc/configs/maple_defconfig | 1 - > arch/powerpc/configs/pasemi_defconfig | 1 - > arch/powerpc/configs/pmac32_defconfig | 1 - > arch/powerpc/configs/powernv_defconfig| 1 - > arch/powerpc/configs/ppc64_defconfig | 1 - > arch/powerpc/configs/ppc64e_defconfig | 1 - > arch/powerpc/configs/ppc6xx_defconfig | 1 - > arch/powerpc/configs/ps3_defconfig| 1 - > arch/powerpc/configs/pseries_defconfig| 1 - > arch/powerpc/include/asm/cputable.h | 20 --- > arch/powerpc/include/asm/oprofile_impl.h | 135 -- > arch/powerpc/include/asm/spu.h| 33 - > arch/powerpc/kernel/cputable.c| 67 - > arch/powerpc/kernel/dt_cpu_ftrs.c | 2 - > arch/powerpc/platforms/cell/Kconfig | 5 - > arch/powerpc/platforms/cell/spu_notify.c | 55 --- + this.. diff --git a/arch/powerpc/platforms/cell/Makefile b/arch/powerpc/platforms/cell/Makefile index 10064a33ca96..7ea6692f67e2 100644 --- a/arch/powerpc/platforms/cell/Makefile +++ b/arch/powerpc/platforms/cell/Makefile @@ -19,7 +19,6 @@ spu-priv1-$(CONFIG_PPC_CELL_COMMON) += spu_priv1_mmio.o spu-manage-$(CONFIG_PPC_CELL_COMMON) += spu_manage.o obj-$(CONFIG_SPU_BASE) += spu_callbacks.o spu_base.o \ - spu_notify.o \ spu_syscalls.o \ $(spu-priv1-y) \ $(spu-manage-y) \ -- viresh
Re: [PATCH 5/6] powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE
On 16/01/2021 02:56, Nathan Lynch wrote: Alexey Kardashevskiy writes: On 15/01/2021 09:00, Nathan Lynch wrote: diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 332e1000ca0f..1aa7ab1cbc84 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -19,8 +19,11 @@ #define RTAS_UNKNOWN_SERVICE (-1) #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */ -/* Buffer size for ppc_rtas system call. */ -#define RTAS_RMOBUF_MAX (64 * 1024) +/* Work areas shared with RTAS must be 4K, naturally aligned. */ Why exactly 4K and not (for example) PAGE_SIZE? 4K is a platform requirement and isn't related to Linux's configured page size. See the PAPR specification for RTAS functions such as ibm,configure-connector, ibm,update-nodes, ibm,update-properties. Good, since we are documenting things here - add to the comment ("per PAPR")? There are other calls with work area parameters where alignment isn't specified (e.g. ibm,get-system-parameter) but 4KB alignment is a safe choice for those. +#define RTAS_WORK_AREA_SIZE 4096 + +/* Work areas allocated for user space access. */ +#define RTAS_USER_REGION_SIZE (RTAS_WORK_AREA_SIZE * 16) This is still 64K but no clarity why. There is 16 of something, what is it? There are 16 4KB work areas in the region. I can name it RTAS_NR_USER_WORK_AREAS or similar. Why 16? PAPR (then add "per PAPR") or we just like 16 ("should be enough")? -- Alexey
Re: [PATCH 6/6] powerpc/rtas: constrain user region allocation to RMA
On 16/01/2021 02:38, Nathan Lynch wrote: Alexey Kardashevskiy writes: On 15/01/2021 09:00, Nathan Lynch wrote: Memory locations passed as arguments from the OS to RTAS usually need to be addressable in 32-bit mode and must reside in the Real Mode Area. On PAPR guests, the RMA starts at logical address 0 and is the first logical memory block reported in the LPAR’s device tree. On powerpc targets with RTAS, Linux makes available to user space a region of memory suitable for arguments to be passed to RTAS via sys_rtas(). This region (rtas_rmo_buf) is allocated via the memblock API during boot in order to ensure that it satisfies the requirements described above. With radix MMU, the upper limit supplied to the memblock allocation can exceed the bounds of the first logical memory block, since ppc64_rma_size is ULONG_MAX and RTAS_INSTANTIATE_MAX is 1GB. (512MB is a common size of the first memory block according to a small sample of LPARs I have checked.) This leads to failures when user space invokes an RTAS function that uses a work area, such as ibm,configure-connector. Alter the determination of the upper limit for rtas_rmo_buf's allocation to consult the device tree directly, ensuring placement within the RMA regardless of the MMU in use. Can we tie this with RTAS (which also needs to be in RMA) and simply add extra 64K in prom_instantiate_rtas() and advertise this address (ALIGH_UP(rtas-base + rtas-size, PAGE_SIZE)) to the user space? We do not need this RMO area before that point. Can you explain more about what advantage that would bring? I'm not seeing it. It's a more significant change than what I've written here. We already allocate space for RTAS and (like RMO) it needs to be in RMA, and RMO is useless without RTAS. We can reuse RTAS allocation code for RMO like this: === diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index e9d4eb6144e1..d9527d3e01d2 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1821,7 +1821,8 @@ static void __init prom_instantiate_rtas(void) if (size == 0) return; - base = alloc_down(size, PAGE_SIZE, 0); + /* One page for RTAS, one for RMO */ + base = alloc_down(size, PAGE_SIZE + PAGE_SIZE, 0); if (base == 0) prom_panic("Could not allocate memory for RTAS\n"); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index d126d71ea5bd..885d95cf4ed3 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1186,6 +1186,7 @@ void __init rtas_initialize(void) rtas.size = size; no_entry = of_property_read_u32(rtas.dev, "linux,rtas-entry", ); rtas.entry = no_entry ? rtas.base : entry; + rtas_rmo_buf = rtas.base + PAGE_SIZE; /* If RTAS was found, allocate the RMO buffer for it and look for * the stop-self token if any @@ -1196,11 +1197,6 @@ void __init rtas_initialize(void) ibm_suspend_me_token = rtas_token("ibm,suspend-me"); } #endif - rtas_rmo_buf = memblock_phys_alloc_range(RTAS_RMOBUF_MAX, PAGE_SIZE, -0, rtas_region); - if (!rtas_rmo_buf) - panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n", - PAGE_SIZE, _region); === May be store in the FDT as "linux,rmo-base" next to "linux,rtas-base", for clarity, as sharing symbols between prom and main kernel is a bit tricky. The benefit is that we do not do the same thing (== find 64K in RMA) in 2 different ways and if the RMO allocated my way is broken - we'll know it much sooner as RTAS itself will break too. Would it interact well with kexec? Good point. For this, the easiest will be setting rtas-size in the FDT to the allocated RTAS space (PAGE_SIZE*2 with the hunk above applied). Probably. And probably do the same with per-cpu RTAS argument structures mentioned in the cover letter? I don't think so, since those need to be allocated with the pacas and limited to the maximum possible CPUs, which is discovered by the kernel much later. The first cell of /proc/device-tree/cpus/ibm,drc-indexes is the number of cores, it is there when RTAS is instantiated, we know SMT after "ibm,client-architecture-support" (if I remember correctly). But maybe I misunderstand what you're suggesting. Usually it is me missing the bigger picture :) -- Alexey
[powerpc:merge] BUILD SUCCESS 41d8cb7ece7c81e4eb897ed7ec7d3c3d72fd0af4
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge branch HEAD: 41d8cb7ece7c81e4eb897ed7ec7d3c3d72fd0af4 Automatic merge of 'master' into merge (2021-01-17 21:16) elapsed time: 758m configs tested: 112 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig arc axs101_defconfig arc axs103_smp_defconfig mips loongson1b_defconfig c6x dsk6455_defconfig mips rbtx49xx_defconfig sh se7343_defconfig powerpcadder875_defconfig powerpc linkstation_defconfig mips capcella_defconfig powerpc katmai_defconfig h8300allyesconfig powerpc holly_defconfig powerpc stx_gp3_defconfig arm integrator_defconfig mips maltasmvp_eva_defconfig pariscgeneric-32bit_defconfig sh sh2007_defconfig m68km5272c3_defconfig sh rts7751r2dplus_defconfig powerpc ps3_defconfig armclps711x_defconfig arcvdk_hs38_defconfig arm sama5_defconfig powerpc iss476-smp_defconfig nds32 defconfig sh ap325rxa_defconfig parisc defconfig powerpc storcenter_defconfig xtensa common_defconfig powerpc mpc885_ads_defconfig alphaalldefconfig mips rs90_defconfig arm pxa910_defconfig sh ecovec24_defconfig mipsmalta_kvm_guest_defconfig nios2alldefconfig powerpcklondike_defconfig mipsnlm_xlr_defconfig armmulti_v7_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig arc defconfig sh allmodconfig s390 allyesconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a006-20210117 x86_64 randconfig-a004-20210117 x86_64 randconfig-a001-20210117 x86_64 randconfig-a005-20210117 x86_64 randconfig-a003-20210117 x86_64 randconfig-a002-20210117 i386 randconfig-a005-20210117 i386 randconfig-a006-20210117 i386 randconfig-a004-20210117 i386 randconfig-a002-20210117 i386 randconfig-a003-20210117 i386 randconfig-a001-20210117 i386 randconfig-a012-20210117 i386 randconfig-a011-20210117 i386 randconfig-a016-20210117 i386 randconfig-a013-20210117 i386 randconfig-a015-20210117 i386 randconfig-a014-20210117 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv rv32_defconfig riscvallmodconfig x86_64 rhel x86_64
Re: [PATCH 1/2] dt-bindings: powerpc: Add a schema for the 'sleep' property
Hi Rob, This patch generates notifications in the Rockchip ARM and arm64 tree. Could you limit the scope to PowerPC only. Kind regards, Johan Jonker make ARCH=arm dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/powerpc/sleep.yaml make ARCH=arm64 dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/powerpc/sleep.yaml Example: /arch/arm64/boot/dts/rockchip/rk3399pro-rock-pi-n10.dt.yaml: pinctrl: sleep: {'ddrio-pwroff': {'rockchip,pins': [[0, 1, 1, 168]]}, 'ap-pwroff': {'rockchip,pins': [[1, 5, 1, 168]]}} is not of type 'array' From schema: /Documentation/devicetree/bindings/powerpc/sleep.yaml On 10/8/20 4:24 PM, Rob Herring wrote: > Document the PowerPC specific 'sleep' property as a schema. It is > currently only documented in booting-without-of.rst which is getting > removed. > > Cc: Michael Ellerman > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Rob Herring > --- > .../devicetree/bindings/powerpc/sleep.yaml| 47 +++ > 1 file changed, 47 insertions(+) > create mode 100644 Documentation/devicetree/bindings/powerpc/sleep.yaml > > diff --git a/Documentation/devicetree/bindings/powerpc/sleep.yaml > b/Documentation/devicetree/bindings/powerpc/sleep.yaml > new file mode 100644 > index ..6494c7d08b93 > --- /dev/null > +++ b/Documentation/devicetree/bindings/powerpc/sleep.yaml > @@ -0,0 +1,47 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/powerpc/sleep.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: PowerPC sleep property > + > +maintainers: > + - Rob Herring > + > +description: | > + Devices on SOCs often have mechanisms for placing devices into low-power > + states that are decoupled from the devices' own register blocks. > Sometimes, > + this information is more complicated than a cell-index property can > + reasonably describe. Thus, each device controlled in such a manner > + may contain a "sleep" property which describes these connections. > + > + The sleep property consists of one or more sleep resources, each of > + which consists of a phandle to a sleep controller, followed by a > + controller-specific sleep specifier of zero or more cells. > + > + The semantics of what type of low power modes are possible are defined > + by the sleep controller. Some examples of the types of low power modes > + that may be supported are: > + > + - Dynamic: The device may be disabled or enabled at any time. > + - System Suspend: The device may request to be disabled or remain > + awake during system suspend, but will not be disabled until then. > + - Permanent: The device is disabled permanently (until the next hard > + reset). > + > + Some devices may share a clock domain with each other, such that they > should > + only be suspended when none of the devices are in use. Where reasonable, > + such nodes should be placed on a virtual bus, where the bus has the sleep > + property. If the clock domain is shared among devices that cannot be > + reasonably grouped in this manner, then create a virtual sleep controller > + (similar to an interrupt nexus, except that defining a standardized > + sleep-map should wait until its necessity is demonstrated). > + > +select: true > + > +properties: > + sleep: > +$ref: /schemas/types.yaml#definitions/phandle-array > + > +additionalProperties: true >
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.11-4 tag
The pull request you sent on Sun, 17 Jan 2021 21:24:00 +1100: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > tags/powerpc-5.11-4 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/a1339d6355ac42e1bf4fcdfce8bfce61172f8891 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
Re: [PATCH v3 1/8] powerpc/uaccess: Add unsafe_copy_from_user
On Mon Jan 11, 2021 at 7:22 AM CST, Christophe Leroy wrote: > > > Le 09/01/2021 à 04:25, Christopher M. Riedl a écrit : > > Implement raw_copy_from_user_allowed() which assumes that userspace read > > access is open. Use this new function to implement raw_copy_from_user(). > > Finally, wrap the new function to follow the usual "unsafe_" convention > > of taking a label argument. > > I think there is no point implementing raw_copy_from_user_allowed(), see > https://github.com/linuxppc/linux/commit/4b842e4e25b1 and > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/8c74fc9ce8131cabb10b3e95dc0e430f396ee83e.1610369143.git.christophe.le...@csgroup.eu/ > > You should simply do: > > #define unsafe_copy_from_user(d, s, l, e) \ > unsafe_op_wrap(__copy_tofrom_user((__force void __user *)d, s, l), e) > I gave this a try and the signal ops decreased by ~8K. Now, to be honest, I am not sure what an "acceptable" benchmark number here actually is - so maybe this is ok? Same loss with both radix and hash: | | hash | radix | | | -- | -- | | linuxppc/next| 118693 | 133296 | | linuxppc/next w/o KUAP+KUEP | 228911 | 228654 | | unsafe-signal64 | 200480 | 234067 | | unsafe-signal64 (__copy_tofrom_user) | 192467 | 225119 | To put this into perspective, prior to KUAP and uaccess flush, signal performance in this benchmark was ~290K on hash. > > Christophe > > > > > The new raw_copy_from_user_allowed() calls non-inline __copy_tofrom_user() > > internally. This is still safe to call inside user access blocks formed > > with user_*_access_begin()/user_*_access_end() since asm functions are not > > instrumented for tracing. > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/include/asm/uaccess.h | 28 +++- > > 1 file changed, 19 insertions(+), 9 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/uaccess.h > > b/arch/powerpc/include/asm/uaccess.h > > index 501c9a79038c..698f3a6d6ae5 100644 > > --- a/arch/powerpc/include/asm/uaccess.h > > +++ b/arch/powerpc/include/asm/uaccess.h > > @@ -403,38 +403,45 @@ raw_copy_in_user(void __user *to, const void __user > > *from, unsigned long n) > > } > > #endif /* __powerpc64__ */ > > > > -static inline unsigned long raw_copy_from_user(void *to, > > - const void __user *from, unsigned long n) > > +static inline unsigned long > > +raw_copy_from_user_allowed(void *to, const void __user *from, unsigned > > long n) > > { > > - unsigned long ret; > > if (__builtin_constant_p(n) && (n <= 8)) { > > - ret = 1; > > + unsigned long ret = 1; > > > > switch (n) { > > case 1: > > barrier_nospec(); > > - __get_user_size(*(u8 *)to, from, 1, ret); > > + __get_user_size_allowed(*(u8 *)to, from, 1, ret); > > break; > > case 2: > > barrier_nospec(); > > - __get_user_size(*(u16 *)to, from, 2, ret); > > + __get_user_size_allowed(*(u16 *)to, from, 2, ret); > > break; > > case 4: > > barrier_nospec(); > > - __get_user_size(*(u32 *)to, from, 4, ret); > > + __get_user_size_allowed(*(u32 *)to, from, 4, ret); > > break; > > case 8: > > barrier_nospec(); > > - __get_user_size(*(u64 *)to, from, 8, ret); > > + __get_user_size_allowed(*(u64 *)to, from, 8, ret); > > break; > > } > > if (ret == 0) > > return 0; > > } > > > > + return __copy_tofrom_user((__force void __user *)to, from, n); > > +} > > + > > +static inline unsigned long > > +raw_copy_from_user(void *to, const void __user *from, unsigned long n) > > +{ > > + unsigned long ret; > > + > > barrier_nospec(); > > allow_read_from_user(from, n); > > - ret = __copy_tofrom_user((__force void __user *)to, from, n); > > + ret = raw_copy_from_user_allowed(to, from, n); > > prevent_read_from_user(from, n); > > return ret; > > } > > @@ -542,6 +549,9 @@ user_write_access_begin(const void __user *ptr, size_t > > len) > > #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), > > e) > > #define unsafe_put_user(x, p, e) __put_user_goto(x, p, e) > > > > +#define unsafe_copy_from_user(d, s, l, e) \ > > + unsafe_op_wrap(raw_copy_from_user_allowed(d, s, l), e) > > + > > #define unsafe_copy_to_user(d, s, l, e) \ > > do { > > \ > > u8 __user *_dst = (u8 __user *)(d); \ > >
Re: [PATCH v3 4/8] powerpc/signal64: Remove TM ifdefery in middle of if/else block
On Mon Jan 11, 2021 at 7:29 AM CST, Christophe Leroy wrote: > > > Le 09/01/2021 à 04:25, Christopher M. Riedl a écrit : > > Rework the messy ifdef breaking up the if-else for TM similar to > > commit f1cf4f93de2f ("powerpc/signal32: Remove ifdefery in middle of > > if/else"). > > > > Unlike that commit for ppc32, the ifdef can't be removed entirely since > > uc_transact in sigframe depends on CONFIG_PPC_TRANSACTIONAL_MEM. > > > > Signed-off-by: Christopher M. Riedl > > --- > > arch/powerpc/kernel/signal_64.c | 17 +++-- > > 1 file changed, 7 insertions(+), 10 deletions(-) > > > > diff --git a/arch/powerpc/kernel/signal_64.c > > b/arch/powerpc/kernel/signal_64.c > > index b211a8ea4f6e..dd3787f67a78 100644 > > --- a/arch/powerpc/kernel/signal_64.c > > +++ b/arch/powerpc/kernel/signal_64.c > > @@ -710,9 +710,7 @@ SYSCALL_DEFINE0(rt_sigreturn) > > struct pt_regs *regs = current_pt_regs(); > > struct ucontext __user *uc = (struct ucontext __user *)regs->gpr[1]; > > sigset_t set; > > -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > unsigned long msr; > > -#endif > > > > /* Always make any pending restarted system calls return -EINTR */ > > current->restart_block.fn = do_no_restart_syscall; > > @@ -762,10 +760,12 @@ SYSCALL_DEFINE0(rt_sigreturn) > > * restore_tm_sigcontexts. > > */ > > regs->msr &= ~MSR_TS_MASK; > > +#endif > > > > if (__get_user(msr, >uc_mcontext.gp_regs[PT_MSR])) > > goto badframe; > > This means you are doing that __get_user() even when msr is not used. > That should be avoided. > Thanks, I moved it into the #ifdef block right above it instead for the next spin. > > if (MSR_TM_ACTIVE(msr)) { > > +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > /* We recheckpoint on return. */ > > struct ucontext __user *uc_transact; > > > > @@ -778,9 +778,8 @@ SYSCALL_DEFINE0(rt_sigreturn) > > if (restore_tm_sigcontexts(current, >uc_mcontext, > >_transact->uc_mcontext)) > > goto badframe; > > - } else > > #endif > > - { > > + } else { > > /* > > * Fall through, for non-TM restore > > * > > @@ -818,10 +817,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t > > *set, > > unsigned long newsp = 0; > > long err = 0; > > struct pt_regs *regs = tsk->thread.regs; > > -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > /* Save the thread's msr before get_tm_stackpointer() changes it */ > > - unsigned long msr = regs->msr; > > -#endif > > + unsigned long msr __maybe_unused = regs->msr; > > I don't thing __maybe_unused() is the right solution. > > I think MSR_TM_ACTIVE() should be fixed instead, either by changing it > into a static inline > function, or doing something similar to > https://github.com/linuxppc/linux/commit/05a4ab823983d9136a460b7b5e0d49ee709a6f86 > Agreed, I'll change MSR_TM_ACTIVE() to reference its argument in the macro. This keeps it consistent with all the other MSR_TM_* macros in reg.h. Probably better than changing it to static inline since that would mean changing all the macros too which seems unecessary. > > > > frame = get_sigframe(ksig, tsk, sizeof(*frame), 0); > > if (!access_ok(frame, sizeof(*frame))) > > @@ -836,8 +833,9 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t > > *set, > > /* Create the ucontext. */ > > err |= __put_user(0, >uc.uc_flags); > > err |= __save_altstack(>uc.uc_stack, regs->gpr[1]); > > -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > + > > if (MSR_TM_ACTIVE(msr)) { > > +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > /* The ucontext_t passed to userland points to the second > > * ucontext_t (for transactional state) with its uc_link ptr. > > */ > > @@ -847,9 +845,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t > > *set, > > tsk, ksig->sig, NULL, > > (unsigned > > long)ksig->ka.sa.sa_handler, > > msr); > > - } else > > #endif > > - { > > + } else { > > err |= __put_user(0, >uc.uc_link); > > prepare_setup_sigcontext(tsk, 1); > > err |= setup_sigcontext(>uc.uc_mcontext, tsk, ksig->sig, > > > > Christophe
Re: [PATCH v15 09/10] arm64: Call kmalloc() to allocate DTB buffer
Hi Ard, On Fri, 2021-01-15 at 09:30 -0800, Lakshmi Ramasubramanian wrote: > create_dtb() function allocates kernel virtual memory for > the device tree blob (DTB). This is not consistent with other > architectures, such as powerpc, which calls kmalloc() for allocating > memory for the DTB. > > Call kmalloc() to allocate memory for the DTB, and kfree() to free > the allocated memory. The vmalloc() function description says, "vmalloc - allocate virtually contiguous memory". I'd appreciate your reviewing this patch, in particular, which replaces vmalloc() with kmalloc(). thanks, Mimi > > Co-developed-by: Prakhar Srivastava > Signed-off-by: Prakhar Srivastava > Signed-off-by: Lakshmi Ramasubramanian > --- > arch/arm64/kernel/machine_kexec_file.c | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kernel/machine_kexec_file.c > b/arch/arm64/kernel/machine_kexec_file.c > index 7de9c47dee7c..51c40143d6fa 100644 > --- a/arch/arm64/kernel/machine_kexec_file.c > +++ b/arch/arm64/kernel/machine_kexec_file.c > @@ -29,7 +29,7 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { > > int arch_kimage_file_post_load_cleanup(struct kimage *image) > { > - vfree(image->arch.dtb); > + kfree(image->arch.dtb); > image->arch.dtb = NULL; > > vfree(image->arch.elf_headers); > @@ -59,19 +59,21 @@ static int create_dtb(struct kimage *image, > + cmdline_len + DTB_EXTRA_SPACE; > > for (;;) { > - buf = vmalloc(buf_size); > + buf = kmalloc(buf_size, GFP_KERNEL); > if (!buf) > return -ENOMEM; > > /* duplicate a device tree blob */ > ret = fdt_open_into(initial_boot_params, buf, buf_size); > - if (ret) > + if (ret) { > + kfree(buf); > return -EINVAL; > + } > > ret = of_kexec_setup_new_fdt(image, buf, initrd_load_addr, >initrd_len, cmdline); > if (ret) { > - vfree(buf); > + kfree(buf); > if (ret == -ENOMEM) { > /* unlikely, but just in case */ > buf_size += DTB_EXTRA_SPACE; > @@ -217,6 +219,6 @@ int load_other_segments(struct kimage *image, > return 0; > > out_err: > - vfree(dtb); > + kfree(dtb); > return ret; > }
[GIT PULL] Please pull powerpc/linux.git powerpc-5.11-4 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull some more powerpc fixes for 5.11: The following changes since commit 3ce47d95b7346dcafd9bed3556a8d072cb2b8571: powerpc: Handle .text.{hot,unlikely}.* in linker script (2021-01-06 21:59:04 +1100) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-5.11-4 for you to fetch changes up to 41131a5e54ae7ba5a2bb8d7b30d1818b3f5b13d2: powerpc/vdso: Fix clock_gettime_fallback for vdso32 (2021-01-14 15:56:44 +1100) - -- powerpc fixes for 5.11 #4 One fix for a lack of alignment in our linker script, that can lead to crashes depending on configuration etc. One fix for the 32-bit VDSO after the C VDSO conversion. Thanks to: Andreas Schwab, Ariel Marcovitch, Christophe Leroy. - -- Andreas Schwab (1): powerpc/vdso: Fix clock_gettime_fallback for vdso32 Ariel Marcovitch (1): powerpc: Fix alignment bug within the init sections arch/powerpc/include/asm/vdso/gettimeofday.h | 16 +++- arch/powerpc/kernel/vmlinux.lds.S| 8 2 files changed, 23 insertions(+), 1 deletion(-) -BEGIN PGP SIGNATURE- iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmAED74ACgkQUevqPMjh pYAyFRAAr2/3tLnaJgu8LIkr64AUuNAXsAhTaVb3MEemJtR7FSEK8pJjw/AtQmHh aTB9Y4qp2kOSQq6C6j4cMitGUYOiDEayCHPP1SserIRVJXmC453jqKb/iHrpaVWo zBdrqrMzgFhlfT6/IDVw/+e5rjycwp9QicQZ0DRX15ZXlqlMSr1b6VH3opku4DyV a9OP/LlR6PAgZQn+qTfeB/z7HnwOdy9R5i/UnrALqrzKGOneQXd+jv7THbudMs/D aVTapfuoon1SPSLWy7xSVKIjFxwV4KUMi0R5kjWlWkXFqdLA2r8XRE3sKcLW1IN1 0Yibv1DRddsnluqe5lclQgzWPfRLdjPhgoIwIq3Ze50aSMuLXU5TatPzXVVKFDWT emVMyQ/SOzWdI7mwsbN1GK85x7cvWW7wMLtEnvJ82vQJpJtAJEyWEZ9UozfpBrPq /H2rrisWMFyZhl3eDdcJCwV7YeOxdCnmqmJnnkTMypRRXyWlfJDHs0CP7fWiKu+j XMsPhxM1hyfrueOW7iPBEt/ZkB17Eq1V0Z2OQU+chXqJmmh9gwSBb/F8iJ48Iphi 4L2ynxJTAHwFY27xE1CQIF0VKycIc7djkDhYoJaL8PaVXQkUo/NWy4zOVNzJpeen HbeLjHKGeeGetWxOniBCgD0PxoOQH8ThQauz+NwzeACGgyPzkM0= =fJl5 -END PGP SIGNATURE-
ibmvnic: Race condition in remove callback
Hello, while working on some cleanup I stumbled over a problem in the ibmvnic's remove callback. Since commit 7d7195a026ba ("ibmvnic: Do not process device remove during device reset") there is the following code in the remove callback: static int ibmvnic_remove(struct vio_dev *dev) { ... spin_lock_irqsave(>state_lock, flags); if (test_bit(0, >resetting)) { spin_unlock_irqrestore(>state_lock, flags); return -EBUSY; } adapter->state = VNIC_REMOVING; spin_unlock_irqrestore(>state_lock, flags); flush_work(>ibmvnic_reset); flush_delayed_work(>ibmvnic_delayed_reset); ... } Unfortunately returning -EBUSY doesn't work as intended. That's because the return value of this function is ignored[1] and the device is considered unbound by the device core (shortly) after ibmvnic_remove() returns. While looking into fixing that I noticed a worse problem: If ibmvnic_reset() (e.g. called by the tx_timeout callback) calls schedule_work(>ibmvnic_reset); just after the work queue is flushed above the problem that 7d7195a026ba intends to fix will trigger resulting in a use-after-free. Also ibmvnic_reset() checks for adapter->state without holding the lock which might be racy, too. Best regards Uwe [1] vio_bus_remove (in arch/powerpc/platforms/pseries/vio.c) records the return value and passes it on. But the driver core doesn't care for the return value (see __device_release_driver() in drivers/base/dd.c calling dev->bus->remove()). -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | https://www.pengutronix.de/ | signature.asc Description: PGP signature