Re: [RFC PATCH v0 1/1] powerpc/percpu: Use 2MB atom_size in percpu allocator on radix
On Mon, Jul 12, 2021 at 01:00:10PM +1000, Nicholas Piggin wrote: > Excerpts from Bharata B Rao's message of July 8, 2021 3:29 pm: > > The atom_size used by percpu allocator on powerpc is currently > > determined by mmu_linear_psize which is initialized to 4K and > > mmu_linear_psize is modified only by hash. Till now for radix > > the atom_size was defaulting to PAGE_SIZE(64K). > > Looks like it was 1MB to me? Was it hash? Because atom_size will get set to 1MB on hash. And both on baremetal and KVM radix, I see 64K atom_size. > > > Go for 2MB > > atom_size on radix if support for 2MB pages exist. > > > > 2MB atom_size on radix will allow using PMD mappings in the > > vmalloc area if and when support for higher sized vmalloc > > mappings is enabled for the pecpu allocator. However right now > > That would be nice. > > > this change will result in more number of units to be allocated > > within one allocation due to increased upa(units per allocation). > > In that case is there any reason to do it until then? Not strictly. I observed a similar setting on x86 which has been there for long, so was just checking if it makes sense here too. > > > > > Signed-off-by: Bharata B Rao > > --- > > arch/powerpc/kernel/setup_64.c | 34 +- > > 1 file changed, 25 insertions(+), 9 deletions(-) > > > > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c > > index 1ff258f6c76c..45ce2d6e8112 100644 > > --- a/arch/powerpc/kernel/setup_64.c > > +++ b/arch/powerpc/kernel/setup_64.c > > @@ -871,6 +871,30 @@ static void __init pcpu_populate_pte(unsigned long > > addr) > > __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE); > > } > > > > +static size_t pcpu_atom_size(void) > > +{ > > + size_t atom_size = PAGE_SIZE; > > + > > + /* > > +* Radix: Use PAGE_SIZE by default or 2M if available. > > +*/ > > + if (radix_enabled()) { > > + if (mmu_psize_defs[MMU_PAGE_2M].shift) > > + atom_size = 1 << mmu_psize_defs[MMU_PAGE_2M].shift; > > Looks like this changes behaviour for radix. Yes, it does as it increases the atom_size which results in higher upa as noted. Did you mean some other behaviour change? > > Also mmu_psize_defs is a pretty horrible interface you only need it in > some low level instruction encodings. You already explicitly know it's > 2MB there, so you can just PMD_SHIFT. Ok. > > If you want to know whether huge PMD is supported and enabled in vmalloc > memory, you would have to add some check which also accounts for > vmap_allow_huge, so that would be another patch. Yes makes sense if we want to tie the setting of higher atom_size to actual availability of PMD mappings in vmalloc. Regards, Bharata.
Re: [RFC PATCH 10/43] powerpc/64s: Always set PMU control registers to frozen/disabled when not in use
On 7/2/21 5:57 AM, Nicholas Piggin wrote: Excerpts from Madhavan Srinivasan's message of July 1, 2021 11:17 pm: On 6/22/21 4:27 PM, Nicholas Piggin wrote: KVM PMU management code looks for particular frozen/disabled bits in the PMU registers so it knows whether it must clear them when coming out of a guest or not. Setting this up helps KVM make these optimisations without getting confused. Longer term the better approach might be to move guest/host PMU switching to the perf subsystem. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/cpu_setup_power.c | 4 ++-- arch/powerpc/kernel/dt_cpu_ftrs.c | 6 +++--- arch/powerpc/kvm/book3s_hv.c | 5 + arch/powerpc/perf/core-book3s.c | 7 +++ 4 files changed, 17 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/cpu_setup_power.c b/arch/powerpc/kernel/cpu_setup_power.c index a29dc8326622..3dc61e203f37 100644 --- a/arch/powerpc/kernel/cpu_setup_power.c +++ b/arch/powerpc/kernel/cpu_setup_power.c @@ -109,7 +109,7 @@ static void init_PMU_HV_ISA207(void) static void init_PMU(void) { mtspr(SPRN_MMCRA, 0); - mtspr(SPRN_MMCR0, 0); + mtspr(SPRN_MMCR0, MMCR0_FC); Sticky point here is, currently if not frozen, pmc5/6 will keep countering. And not freezing them at boot is quiet useful sometime, like say when running in a simulation where we could calculate approx CPIs for micro benchmarks without perf subsystem. Sorry for a delayed response You even can't use the sysfs files in this sim environment? In that case Yes it is possible. Just that we need to have additional patch to maintain for the sim. what if we added a boot option that could set some things up? In that case possibly you could even gather some more types of events too. Having a boot option will be a over-kill :). I guess we could have this under config option? Thanks, Nick
Re: [RFC PATCH v0 1/1] powerpc/percpu: Use 2MB atom_size in percpu allocator on radix
Excerpts from Bharata B Rao's message of July 8, 2021 3:29 pm: > The atom_size used by percpu allocator on powerpc is currently > determined by mmu_linear_psize which is initialized to 4K and > mmu_linear_psize is modified only by hash. Till now for radix > the atom_size was defaulting to PAGE_SIZE(64K). Looks like it was 1MB to me? > Go for 2MB > atom_size on radix if support for 2MB pages exist. > > 2MB atom_size on radix will allow using PMD mappings in the > vmalloc area if and when support for higher sized vmalloc > mappings is enabled for the pecpu allocator. However right now That would be nice. > this change will result in more number of units to be allocated > within one allocation due to increased upa(units per allocation). In that case is there any reason to do it until then? > > Signed-off-by: Bharata B Rao > --- > arch/powerpc/kernel/setup_64.c | 34 +- > 1 file changed, 25 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c > index 1ff258f6c76c..45ce2d6e8112 100644 > --- a/arch/powerpc/kernel/setup_64.c > +++ b/arch/powerpc/kernel/setup_64.c > @@ -871,6 +871,30 @@ static void __init pcpu_populate_pte(unsigned long addr) > __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE); > } > > +static size_t pcpu_atom_size(void) > +{ > + size_t atom_size = PAGE_SIZE; > + > + /* > + * Radix: Use PAGE_SIZE by default or 2M if available. > + */ > + if (radix_enabled()) { > + if (mmu_psize_defs[MMU_PAGE_2M].shift) > + atom_size = 1 << mmu_psize_defs[MMU_PAGE_2M].shift; Looks like this changes behaviour for radix. Also mmu_psize_defs is a pretty horrible interface you only need it in some low level instruction encodings. You already explicitly know it's 2MB there, so you can just PMD_SHIFT. If you want to know whether huge PMD is supported and enabled in vmalloc memory, you would have to add some check which also accounts for vmap_allow_huge, so that would be another patch. Thanks, Nick > + goto out; > + } > + > + /* > + * Hash: Linear mapping is one of 4K, 1M and 16M. For 4K, no need > + * to group units. For larger mappings, use 1M atom which > + * should be large enough to contain a number of units. > + */ > + if (mmu_linear_psize != MMU_PAGE_4K) > + atom_size = 1 << 20; > + > +out: > + return atom_size; > +} > > void __init setup_per_cpu_areas(void) > { > @@ -880,15 +904,7 @@ void __init setup_per_cpu_areas(void) > unsigned int cpu; > int rc = -EINVAL; > > - /* > - * Linear mapping is one of 4K, 1M and 16M. For 4K, no need > - * to group units. For larger mappings, use 1M atom which > - * should be large enough to contain a number of units. > - */ > - if (mmu_linear_psize == MMU_PAGE_4K) > - atom_size = PAGE_SIZE; > - else > - atom_size = 1 << 20; > + atom_size = pcpu_atom_size(); > > if (pcpu_chosen_fc != PCPU_FC_PAGE) { > rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, > pcpu_cpu_distance, > -- > 2.31.1 > >
Re: [RFC PATCH 27/43] KVM: PPC: Book3S HV P9: Move host OS save/restore functions to built-in
Excerpts from Athira Rajeev's message of July 8, 2021 3:32 pm: > > >> On 22-Jun-2021, at 4:27 PM, Nicholas Piggin wrote: >> >> Move the P9 guest/host register switching functions to the built-in >> P9 entry code, and export it for nested to use as well. >> >> This allows more flexibility in scheduling these supervisor privileged >> SPR accesses with the HV privileged and PR SPR accesses in the low level >> entry code. >> >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/kvm/book3s_hv.c | 351 +- >> arch/powerpc/kvm/book3s_hv.h | 39 +++ >> arch/powerpc/kvm/book3s_hv_p9_entry.c | 332 >> 3 files changed, 372 insertions(+), 350 deletions(-) >> create mode 100644 arch/powerpc/kvm/book3s_hv.h >> >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 35749b0b663f..a7660af22161 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -79,6 +79,7 @@ >> #include >> >> #include "book3s.h" >> +#include "book3s_hv.h" >> >> #define CREATE_TRACE_POINTS >> #include "trace_hv.h" >> @@ -3675,356 +3676,6 @@ static noinline void kvmppc_run_core(struct >> kvmppc_vcore *vc) >> trace_kvmppc_run_core(vc, 1); >> } >> >> -/* >> - * Privileged (non-hypervisor) host registers to save. >> - */ >> -struct p9_host_os_sprs { >> -unsigned long dscr; >> -unsigned long tidr; >> -unsigned long iamr; >> -unsigned long amr; >> -unsigned long fscr; >> - >> -unsigned int pmc1; >> -unsigned int pmc2; >> -unsigned int pmc3; >> -unsigned int pmc4; >> -unsigned int pmc5; >> -unsigned int pmc6; >> -unsigned long mmcr0; >> -unsigned long mmcr1; >> -unsigned long mmcr2; >> -unsigned long mmcr3; >> -unsigned long mmcra; >> -unsigned long siar; >> -unsigned long sier1; >> -unsigned long sier2; >> -unsigned long sier3; >> -unsigned long sdar; >> -}; >> - >> -static void freeze_pmu(unsigned long mmcr0, unsigned long mmcra) >> -{ >> -if (!(mmcr0 & MMCR0_FC)) >> -goto do_freeze; >> -if (mmcra & MMCRA_SAMPLE_ENABLE) >> -goto do_freeze; >> -if (cpu_has_feature(CPU_FTR_ARCH_31)) { >> -if (!(mmcr0 & MMCR0_PMCCEXT)) >> -goto do_freeze; >> -if (!(mmcra & MMCRA_BHRB_DISABLE)) >> -goto do_freeze; >> -} >> -return; >> - >> -do_freeze: >> -mmcr0 = MMCR0_FC; >> -mmcra = 0; >> -if (cpu_has_feature(CPU_FTR_ARCH_31)) { >> -mmcr0 |= MMCR0_PMCCEXT; >> -mmcra = MMCRA_BHRB_DISABLE; >> -} >> - >> -mtspr(SPRN_MMCR0, mmcr0); >> -mtspr(SPRN_MMCRA, mmcra); >> -isync(); >> -} >> - >> -static void switch_pmu_to_guest(struct kvm_vcpu *vcpu, >> -struct p9_host_os_sprs *host_os_sprs) >> -{ >> -struct lppaca *lp; >> -int load_pmu = 1; >> - >> -lp = vcpu->arch.vpa.pinned_addr; >> -if (lp) >> -load_pmu = lp->pmcregs_in_use; >> - >> -if (load_pmu) >> - vcpu->arch.hfscr |= HFSCR_PM; >> - >> -/* Save host */ >> -if (ppc_get_pmu_inuse()) { >> -/* >> - * It might be better to put PMU handling (at least for the >> - * host) in the perf subsystem because it knows more about what >> - * is being used. >> - */ >> - >> -/* POWER9, POWER10 do not implement HPMC or SPMC */ >> - >> -host_os_sprs->mmcr0 = mfspr(SPRN_MMCR0); >> -host_os_sprs->mmcra = mfspr(SPRN_MMCRA); >> - >> -freeze_pmu(host_os_sprs->mmcr0, host_os_sprs->mmcra); >> - >> -host_os_sprs->pmc1 = mfspr(SPRN_PMC1); >> -host_os_sprs->pmc2 = mfspr(SPRN_PMC2); >> -host_os_sprs->pmc3 = mfspr(SPRN_PMC3); >> -host_os_sprs->pmc4 = mfspr(SPRN_PMC4); >> -host_os_sprs->pmc5 = mfspr(SPRN_PMC5); >> -host_os_sprs->pmc6 = mfspr(SPRN_PMC6); >> -host_os_sprs->mmcr1 = mfspr(SPRN_MMCR1); >> -host_os_sprs->mmcr2 = mfspr(SPRN_MMCR2); >> -host_os_sprs->sdar = mfspr(SPRN_SDAR); >> -host_os_sprs->siar = mfspr(SPRN_SIAR); >> -host_os_sprs->sier1 = mfspr(SPRN_SIER); >> - >> -if (cpu_has_feature(CPU_FTR_ARCH_31)) { >> -host_os_sprs->mmcr3 = mfspr(SPRN_MMCR3); >> -host_os_sprs->sier2 = mfspr(SPRN_SIER2); >> -host_os_sprs->sier3 = mfspr(SPRN_SIER3); >> -} >> -} >> - >> -#ifdef CONFIG_PPC_PSERIES >> -if (kvmhv_on_pseries()) { >> -if (vcpu->arch.vpa.pinned_addr) { >> -struct lppaca *lp = vcpu->arch.vpa.pinned_addr; >> -get_lppaca()->pmcregs_in_use = lp->pmcregs_in_use; >> -} else { >> -get_lppaca()->pmcregs_in_use = 1; >> -} >> -} >> -#endif >> - >> -/* Load guest */ >> -i
Re: [RFC PATCH 11/43] KVM: PPC: Book3S HV P9: Implement PMU save/restore in C
Excerpts from Athira Rajeev's message of July 10, 2021 12:47 pm: > > >> On 22-Jun-2021, at 4:27 PM, Nicholas Piggin wrote: >> >> Implement the P9 path PMU save/restore code in C, and remove the >> POWER9/10 code from the P7/8 path assembly. >> >> -449 cycles (8533) POWER9 virt-mode NULL hcall >> >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/include/asm/asm-prototypes.h | 5 - >> arch/powerpc/kvm/book3s_hv.c | 205 -- >> arch/powerpc/kvm/book3s_hv_interrupts.S | 13 +- >> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 + >> 4 files changed, 200 insertions(+), 66 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/asm-prototypes.h >> b/arch/powerpc/include/asm/asm-prototypes.h >> index 02ee6f5ac9fe..928db8ef9a5a 100644 >> --- a/arch/powerpc/include/asm/asm-prototypes.h >> +++ b/arch/powerpc/include/asm/asm-prototypes.h >> @@ -136,11 +136,6 @@ static inline void kvmppc_restore_tm_hv(struct kvm_vcpu >> *vcpu, u64 msr, >> bool preserve_nv) { } >> #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ >> >> -void kvmhv_save_host_pmu(void); >> -void kvmhv_load_host_pmu(void); >> -void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use); >> -void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu); >> - >> void kvmppc_p9_enter_guest(struct kvm_vcpu *vcpu); >> >> long kvmppc_h_set_dabr(struct kvm_vcpu *vcpu, unsigned long dabr); >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index f7349d150828..b1b94b3563b7 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -3635,6 +3635,188 @@ static noinline void kvmppc_run_core(struct >> kvmppc_vcore *vc) >> trace_kvmppc_run_core(vc, 1); >> } >> >> +/* >> + * Privileged (non-hypervisor) host registers to save. >> + */ >> +struct p9_host_os_sprs { >> +unsigned long dscr; >> +unsigned long tidr; >> +unsigned long iamr; >> +unsigned long amr; >> +unsigned long fscr; >> + >> +unsigned int pmc1; >> +unsigned int pmc2; >> +unsigned int pmc3; >> +unsigned int pmc4; >> +unsigned int pmc5; >> +unsigned int pmc6; >> +unsigned long mmcr0; >> +unsigned long mmcr1; >> +unsigned long mmcr2; >> +unsigned long mmcr3; >> +unsigned long mmcra; >> +unsigned long siar; >> +unsigned long sier1; >> +unsigned long sier2; >> +unsigned long sier3; >> +unsigned long sdar; >> +}; >> + >> +static void freeze_pmu(unsigned long mmcr0, unsigned long mmcra) >> +{ >> +if (!(mmcr0 & MMCR0_FC)) >> +goto do_freeze; >> +if (mmcra & MMCRA_SAMPLE_ENABLE) >> +goto do_freeze; >> +if (cpu_has_feature(CPU_FTR_ARCH_31)) { >> +if (!(mmcr0 & MMCR0_PMCCEXT)) >> +goto do_freeze; >> +if (!(mmcra & MMCRA_BHRB_DISABLE)) >> +goto do_freeze; >> +} >> +return; > > > Hi Nick > > When freezing the PMU, do we need to also set pmcregs_in_use to zero ? Not immediately, we still need to save out the values of the PMU registers. If we clear pmcregs_in_use, then our hypervisor can discard the contents of those registers at any time. > Also, why we need these above conditions like MMCRA_SAMPLE_ENABLE, > MMCR0_PMCCEXT checks also before freezing ? Basically just because that's the condition we wnat to set the PMU to before entering the guest if the guest does not have its own PMU registers in use. I'm not entirely sure this is correct / optimal for perf though, so we should double check that. I think some of this stuff should be wrapped up and put into perf/ subdirectory as I've said before. KVM shouldn't need to know about exactly how PMU is to be set up and managed by perf, we should just be able to ask perf to save/restore/switch state. Thanks, Nick
Re: [RFC PATCH 10/43] powerpc/64s: Always set PMU control registers to frozen/disabled when not in use
Excerpts from Athira Rajeev's message of July 10, 2021 12:50 pm: > > >> On 22-Jun-2021, at 4:27 PM, Nicholas Piggin wrote: >> >> KVM PMU management code looks for particular frozen/disabled bits in >> the PMU registers so it knows whether it must clear them when coming >> out of a guest or not. Setting this up helps KVM make these optimisations >> without getting confused. Longer term the better approach might be to >> move guest/host PMU switching to the perf subsystem. >> >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/kernel/cpu_setup_power.c | 4 ++-- >> arch/powerpc/kernel/dt_cpu_ftrs.c | 6 +++--- >> arch/powerpc/kvm/book3s_hv.c | 5 + >> arch/powerpc/perf/core-book3s.c | 7 +++ >> 4 files changed, 17 insertions(+), 5 deletions(-) >> >> diff --git a/arch/powerpc/kernel/cpu_setup_power.c >> b/arch/powerpc/kernel/cpu_setup_power.c >> index a29dc8326622..3dc61e203f37 100644 >> --- a/arch/powerpc/kernel/cpu_setup_power.c >> +++ b/arch/powerpc/kernel/cpu_setup_power.c >> @@ -109,7 +109,7 @@ static void init_PMU_HV_ISA207(void) >> static void init_PMU(void) >> { >> mtspr(SPRN_MMCRA, 0); >> -mtspr(SPRN_MMCR0, 0); >> +mtspr(SPRN_MMCR0, MMCR0_FC); >> mtspr(SPRN_MMCR1, 0); >> mtspr(SPRN_MMCR2, 0); >> } >> @@ -123,7 +123,7 @@ static void init_PMU_ISA31(void) >> { >> mtspr(SPRN_MMCR3, 0); >> mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE); >> -mtspr(SPRN_MMCR0, MMCR0_PMCCEXT); >> +mtspr(SPRN_MMCR0, MMCR0_FC | MMCR0_PMCCEXT); >> } >> >> /* >> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c >> b/arch/powerpc/kernel/dt_cpu_ftrs.c >> index 0a6b36b4bda8..06a089fbeaa7 100644 >> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c >> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c >> @@ -353,7 +353,7 @@ static void init_pmu_power8(void) >> } >> >> mtspr(SPRN_MMCRA, 0); >> -mtspr(SPRN_MMCR0, 0); >> +mtspr(SPRN_MMCR0, MMCR0_FC); >> mtspr(SPRN_MMCR1, 0); >> mtspr(SPRN_MMCR2, 0); >> mtspr(SPRN_MMCRS, 0); >> @@ -392,7 +392,7 @@ static void init_pmu_power9(void) >> mtspr(SPRN_MMCRC, 0); >> >> mtspr(SPRN_MMCRA, 0); >> -mtspr(SPRN_MMCR0, 0); >> +mtspr(SPRN_MMCR0, MMCR0_FC); >> mtspr(SPRN_MMCR1, 0); >> mtspr(SPRN_MMCR2, 0); >> } >> @@ -428,7 +428,7 @@ static void init_pmu_power10(void) >> >> mtspr(SPRN_MMCR3, 0); >> mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE); >> -mtspr(SPRN_MMCR0, MMCR0_PMCCEXT); >> +mtspr(SPRN_MMCR0, MMCR0_FC | MMCR0_PMCCEXT); >> } >> >> static int __init feat_enable_pmu_power10(struct dt_cpu_feature *f) >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 1f30f98b09d1..f7349d150828 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -2593,6 +2593,11 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu >> *vcpu) >> #endif >> #endif >> vcpu->arch.mmcr[0] = MMCR0_FC; >> +if (cpu_has_feature(CPU_FTR_ARCH_31)) { >> +vcpu->arch.mmcr[0] |= MMCR0_PMCCEXT; >> +vcpu->arch.mmcra = MMCRA_BHRB_DISABLE; >> +} >> + >> vcpu->arch.ctrl = CTRL_RUNLATCH; >> /* default to host PVR, since we can't spoof it */ >> kvmppc_set_pvr_hv(vcpu, mfspr(SPRN_PVR)); >> diff --git a/arch/powerpc/perf/core-book3s.c >> b/arch/powerpc/perf/core-book3s.c >> index 51622411a7cc..e33b29ec1a65 100644 >> --- a/arch/powerpc/perf/core-book3s.c >> +++ b/arch/powerpc/perf/core-book3s.c >> @@ -1361,6 +1361,13 @@ static void power_pmu_enable(struct pmu *pmu) >> goto out; >> >> if (cpuhw->n_events == 0) { >> +if (cpu_has_feature(CPU_FTR_ARCH_31)) { >> +mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE); >> +mtspr(SPRN_MMCR0, MMCR0_FC | MMCR0_PMCCEXT); >> +} else { >> +mtspr(SPRN_MMCRA, 0); >> +mtspr(SPRN_MMCR0, MMCR0_FC); >> +} > > > Hi Nick, > > We are setting these bits in “power_pmu_disable” function. And disable will > be called before any event gets deleted/stopped. Can you please help to > understand why this is needed in power_pmu_enable path also ? I'll have to go back and check what I needed it for. Basically what I'm trying to achieve is that when the guest and host are not running perf, then the registers should match. This way the host can test them with mfspr but does not need to fix them with mtspr. It's not too important for performance after TM facility demand faulting and the nested guest PMU fix means you can usually just skip that entirely, but I think it's cleaner. I'll double check my perf/ changes it's possible that part should be split out or is unnecessary. Thanks, Nick
Re: [PATCH V3 1/1] powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC
Excerpts from Athira Rajeev's message of July 11, 2021 1:58 am: > Running perf fuzzer showed below in dmesg logs: > "Can't find PMC that caused IRQ" > > This means a PMU exception happened, but none of the PMC's (Performance > Monitor Counter) were found to be overflown. There are some corner cases > that clears the PMCs after PMI gets masked. In such cases, the perf > interrupt handler will not find the active PMC values that had caused > the overflow and thus leads to this message while replaying. > > Case 1: PMU Interrupt happens during replay of other interrupts and > counter values gets cleared by PMU callbacks before replay: > > During replay of interrupts like timer, __do_irq and doorbell exception, we > conditionally enable interrupts via may_hard_irq_enable(). This could > potentially create a window to generate a PMI. Since irq soft mask is set > to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before > perf interrupt is replayed and the PMU events could deleted or stopped. > This will change the PMU SPR values and resets the counters. Snippet of > ftrace log showing PMU callbacks invoked in "__do_irq": > > -0 [051] dns. 132025441306354: __do_irq <-call_do_irq > -0 [051] dns. 132025441306430: irq_enter <-__do_irq > -0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq > -0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq > <<>> > -0 [051] dnH. 132025441307770: > generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed > -0 [051] dnH. 132025441307839: flush_smp_call_function_queue > <-smp_ipi_demux_relaxed > -0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function > -0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable > -0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out > -0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del > -0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read > -0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del > -0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del > -0 [051] dnH. 132025441308787: power_pmu_event_idx > <-perf_event_update_userpage > -0 [051] dnH. 132025441308859: rcu_read_unlock_strict > <-perf_event_update_userpage > -0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable > <<>> > -0 [051] dnH. 132025441311108: irq_exit <-__do_irq > -0 [051] dns. 132025441311319: performance_monitor_exception > <-replay_soft_interrupts > > Case 2: PMI's masked during local_* operations, example local_add. > If the local_add operation happens within a local_irq_save, replay of > PMI will be during local_irq_restore. Similar to case 1, this could > also create a window before replay where PMU events gets deleted or > stopped. > > Patch adds a fix to update the PMU callback function 'power_pmu_disable' to > check for pending perf interrupt. If there is an overflown PMC and pending > perf interrupt indicated in Paca, clear the PMI bit in paca to drop that > sample. Clearing of PMI bit is done in 'power_pmu_disable' since disable is > invoked before any event gets deleted/stopped. With this fix, if there are > more than one event running in the PMU, there is a chance that we clear the > PMI bit for the event which is not getting deleted/stopped. The other > events may still remain active. Hence to make sure we don't drop valid > sample in such cases, another check is added in power_pmu_enable. This > checks if there is an overflown PMC found among the active events and if > so enable back the PMI bit. Two new helper functions are introduced to > clear/set the PMI, ie 'clear_pmi_irq_pending' and 'set_pmi_irq_pending'. > > Also there are corner cases which results in performance monitor interrupts > getting triggered during power_pmu_disable. This happens since PMXE bit is > not cleared along with disabling of other MMCR0 bits in the pmu_disable. > Such PMI's could leave the PMU running and could trigger PMI again which > will set MMCR0 PMAO bit. This could lead to spurious interrupts in some > corner cases. Example, a timer after power_pmu_del which will re-enable > interrupts and triggers a PMI again since PMAO bit is still set. But fails > to find valid overflow since PMC get cleared in power_pmu_del. Patch > fixes this by disabling PMXE along with disabling of other MMCR0 bits > in power_pmu_disable. > > We can't just replay PMI any time. Hence this approach is preferred rather > than replaying PMI before resetting overflown PMC. Patch also documents > core-book3s on a race condition which can trigger these PMC messages during > idle path in PowerNV. > > Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and > replay them") > Reported-by: Nageswara R Sastry > Suggested-by: Nicholas Piggin > Suggested-by: Madhavan Srinivasan > Signed-off-by: Athira Rajeev > --- > arch/powerpc/include/asm/hw_irq.h | 31 + > arch/powerpc/perf/core-book3s.c | 49 > ++- > 2 files changed, 79 i
RE: [PATCH v2] fpga: dfl: fme: Fix cpu hotplug issue in performance reporting
> Subject: [PATCH v2] fpga: dfl: fme: Fix cpu hotplug issue in performance > reporting > > The performance reporting driver added cpu hotplug > feature but it didn't add pmu migration call in cpu > offline function. > This can create an issue incase the current designated > cpu being used to collect fme pmu data got offline, > as based on current code we are not migrating fme pmu to > new target cpu. Because of that perf will still try to > fetch data from that offline cpu and hence we will not > get counter data. > > Patch fixed this issue by adding pmu_migrate_context call > in fme_perf_offline_cpu function. > > Fixes: 724142f8c42a ("fpga: dfl: fme: add performance reporting support") > Tested-by: Xu Yilun > Signed-off-by: Kajol Jain Thanks for this fixing! And thanks Yilun for the verification too. : ) > Cc: sta...@vger.kernel.org > --- > drivers/fpga/dfl-fme-perf.c | 4 > 1 file changed, 4 insertions(+) > > --- > Changelog: > v1 -> v2: > - Add sta...@vger.kernel.org in cc list > > RFC -> PATCH v1 > - Remove RFC tag > - Did nits changes on subject and commit message as suggested by Xu Yilun > - Added Tested-by tag > - Link to rfc patch: https://lkml.org/lkml/2021/6/28/112 > --- > diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c > index 4299145ef347..b9a54583e505 100644 > --- a/drivers/fpga/dfl-fme-perf.c > +++ b/drivers/fpga/dfl-fme-perf.c > @@ -953,6 +953,10 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct > hlist_node *node) > return 0; > > priv->cpu = target; > + > + /* Migrate fme_perf pmu events to the new target cpu */ Only one minor item, this line is not needed. : ) After remove it, Acked-by: Wu Hao Thanks Hao > + perf_pmu_migrate_context(&priv->pmu, cpu, target); > + > return 0; > } > > -- > 2.31.1
[PATCH] KVM: PPC: Book3S HV P9: Fix guest TM support
The conversion to C introduced several bugs in TM handling that can cause host crashes with TM bad thing interrupts. Mostly just simple typos or missed logic in the conversion that got through due to my not testing TM in the guest sufficiently. - Early TM emulation for the softpatch interrupt should be done if fake suspend mode is _not_ active. - Early TM emulation wants to return immediately to the guest so as to not doom transactions unnecessarily. - And if exiting from the guest, the host MSR should include the TM[S] bit if the guest was T/S, before it is treclaimed. After this fix, all the TM selftests pass when running on a P9 processor that implements TM with softpatch interrupt. Reported-by: Alexey Kardashevskiy Fixes: 89d35b2391015 ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C") Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_hv_p9_entry.c | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c index 83f592eadcd2..becb0c08ebda 100644 --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c @@ -317,6 +317,9 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpc */ mtspr(SPRN_HDEC, hdec); +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +tm_return_to_guest: +#endif mtspr(SPRN_DAR, vcpu->arch.shregs.dar); mtspr(SPRN_DSISR, vcpu->arch.shregs.dsisr); mtspr(SPRN_SRR0, vcpu->arch.shregs.srr0); @@ -415,11 +418,23 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpc * is in real suspend mode and is trying to transition to * transactional mode. */ - if (local_paca->kvm_hstate.fake_suspend && + if (!local_paca->kvm_hstate.fake_suspend && (vcpu->arch.shregs.msr & MSR_TS_S)) { if (kvmhv_p9_tm_emulation_early(vcpu)) { - /* Prevent it being handled again. */ - trap = 0; + /* +* Go straight back into the guest with the +* new NIP/MSR as set by TM emulation. +*/ + mtspr(SPRN_HSRR0, vcpu->arch.regs.nip); + mtspr(SPRN_HSRR1, vcpu->arch.shregs.msr); + + /* +* tm_return_to_guest re-loads SRR0/1, DAR, +* DSISR after RI is cleared, in case they had +* been clobbered by a MCE. +*/ + __mtmsrd(0, 1); /* clear RI */ + goto tm_return_to_guest; } } #endif @@ -499,6 +514,9 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpc * If we are in real mode, only switch MMU on after the MMU is * switched to host, to avoid the P9_RADIX_PREFETCH_BUG. */ + if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) && + vcpu->arch.shregs.msr & MSR_TS_MASK) + msr |= MSR_TS_S; __mtmsrd(msr, 0); end_timing(vcpu); -- 2.23.0
Re: [PATCH v3 2/2] net/ps3_gelic: Cleanups, improve logging
Geoff Levand a écrit : General source cleanups and improved logging messages. Describe a bit more what you do to cleanup and improve. Some of your changes are not cleanup , they increase the mess. You should read kernel coding style Signed-off-by: Geoff Levand --- drivers/net/ethernet/toshiba/ps3_gelic_net.c | 395 ++- 1 file changed, 216 insertions(+), 179 deletions(-) diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c index e01938128882..9dbcb7c4ec80 100644 --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c @@ -44,17 +44,17 @@ MODULE_AUTHOR("SCE Inc."); MODULE_DESCRIPTION("Gelic Network driver"); MODULE_LICENSE("GPL"); - -/* set irq_mask */ int gelic_card_set_irq_mask(struct gelic_card *card, u64 mask) { + struct device *dev = ctodev(card); int status; status = lv1_net_set_interrupt_mask(bus_id(card), dev_id(card), mask, 0); - if (status) - dev_info(ctodev(card), -"%s failed %d\n", __func__, status); + if (status) { No { } for single lines + dev_err(dev, "%s:%d failed: %d\n", __func__, __LINE__, status); + } + return status; } @@ -63,6 +63,7 @@ static void gelic_card_rx_irq_on(struct gelic_card *card) card->irq_mask |= GELIC_CARD_RXINT; gelic_card_set_irq_mask(card, card->irq_mask); } + static void gelic_card_rx_irq_off(struct gelic_card *card) { card->irq_mask &= ~GELIC_CARD_RXINT; @@ -70,15 +71,14 @@ static void gelic_card_rx_irq_off(struct gelic_card *card) } static void gelic_card_get_ether_port_status(struct gelic_card *card, -int inform) + int inform) Bad indent { u64 v2; struct net_device *ether_netdev; lv1_net_control(bus_id(card), dev_id(card), - GELIC_LV1_GET_ETH_PORT_STATUS, - GELIC_LV1_VLAN_TX_ETHERNET_0, 0, 0, - &card->ether_port_status, &v2); + GELIC_LV1_GET_ETH_PORT_STATUS, GELIC_LV1_VLAN_TX_ETHERNET_0, 0, + 0, &card->ether_port_status, &v2); Bad indent if (inform) { ether_netdev = card->netdev[GELIC_PORT_ETHERNET_0]; @@ -105,15 +105,17 @@ gelic_descr_get_status(struct gelic_descr *descr) static int gelic_card_set_link_mode(struct gelic_card *card, int mode) { + struct device *dev = ctodev(card); int status; u64 v1, v2; status = lv1_net_control(bus_id(card), dev_id(card), -GELIC_LV1_SET_NEGOTIATION_MODE, -GELIC_LV1_PHY_ETHERNET_0, mode, 0, &v1, &v2); + GELIC_LV1_SET_NEGOTIATION_MODE, GELIC_LV1_PHY_ETHERNET_0, mode, + 0, &v1, &v2); + if (status) { - pr_info("%s: failed setting negotiation mode %d\n", __func__, - status); + dev_err(dev, "%s:%d: Failed setting negotiation mode: %d\n", + __func__, __LINE__, status); return -EBUSY; } @@ -130,13 +132,16 @@ static int gelic_card_set_link_mode(struct gelic_card *card, int mode) */ static void gelic_card_disable_txdmac(struct gelic_card *card) { + struct device *dev = ctodev(card); int status; /* this hvc blocks until the DMA in progress really stopped */ status = lv1_net_stop_tx_dma(bus_id(card), dev_id(card)); - if (status) - dev_err(ctodev(card), - "lv1_net_stop_tx_dma failed, status=%d\n", status); + + if (status) { + dev_err(dev, "%s:%d: lv1_net_stop_tx_dma failed: %d\n", + __func__, __LINE__, status); + } } /** @@ -187,13 +192,16 @@ static void gelic_card_enable_rxdmac(struct gelic_card *card) */ static void gelic_card_disable_rxdmac(struct gelic_card *card) { + struct device *dev = ctodev(card); int status; /* this hvc blocks until the DMA in progress really stopped */ status = lv1_net_stop_rx_dma(bus_id(card), dev_id(card)); - if (status) - dev_err(ctodev(card), - "lv1_net_stop_rx_dma failed, %d\n", status); + + if (status) { + dev_err(dev, "%s:%d: lv1_net_stop_rx_dma failed: %d\n", + __func__, __LINE__, status); + } } /** @@ -216,6 +224,7 @@ static void gelic_descr_set_status(struct gelic_descr *descr, * Usually caller of this function wants to inform that to the * hardware, so we assure here the hardware sees the change. */ + Bad blank line, it separates the comment from the commentee wmb(); } @@ -229,8 +238,7 @@ static void gelic_descr_set_status(struct gelic_descr *descr, * and re-i
Re: [PATCH v3 1/2] net/ps3_gelic: Add gelic_descr structures
Geoff Levand a écrit : Create two new structures, struct gelic_hw_regs and struct gelic_chain_link, and replace the corresponding members of struct gelic_descr with the new structures. struct gelic_hw_regs holds the register variables used by the gelic hardware device. struct gelic_chain_link holds variables used to manage the driver's linked list of gelic descr structures. Fixes several DMA mapping problems with the PS3's gelic network driver: * Change from checking the return value of dma_map_single to using the dma_mapping_error routine. * Use the correct buffer length when mapping the RX skb. * Improved error checking and debug logging. Your patch has a lot of cosmetic changes. Several of them are just wrong. The other ones belong to another patch. This patch should focus only on the changes it targets. Your patch is way too big and addresses several different topics. Should be split in several patches. I suggest you run checkpatch.pl --strict on your patch Fixes runtime errors like these, and also other randomly occurring errors: IP-Config: Complete: DMA-API: ps3_gelic_driver sb_05: device driver failed to check map error WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:1027 .check_unmap+0x888/0x8dc Signed-off-by: Geoff Levand --- drivers/net/ethernet/toshiba/ps3_gelic_net.c | 573 +++ drivers/net/ethernet/toshiba/ps3_gelic_net.h | 24 +- 2 files changed, 341 insertions(+), 256 deletions(-) diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c index 55e652624bd7..e01938128882 100644 --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c @@ -96,9 +96,11 @@ static void gelic_card_get_ether_port_status(struct gelic_card *card, * returns the status as in the dmac_cmd_status field of the descriptor */ static enum gelic_descr_dma_status + This blank line is pointless and misleading gelic_descr_get_status(struct gelic_descr *descr) { - return be32_to_cpu(descr->dmac_cmd_status) & GELIC_DESCR_DMA_STAT_MASK; + return be32_to_cpu(descr->hw_regs.dmac_cmd_status) + & GELIC_DESCR_DMA_STAT_MASK; The & should be at the end of previous line and the second line should be aligned to the open parenthesis } static int gelic_card_set_link_mode(struct gelic_card *card, int mode) @@ -146,24 +148,34 @@ static void gelic_card_disable_txdmac(struct gelic_card *card) */ static void gelic_card_enable_rxdmac(struct gelic_card *card) { + struct device *dev = ctodev(card); int status; +#if defined(DEBUG) + static const int debug_build = 1; +#else + static const int debug_build = 0; +#endif Useless You can directly use __is_defined(DEBUG) below -#ifdef DEBUG - if (gelic_descr_get_status(card->rx_chain.head) != - GELIC_DESCR_DMA_CARDOWNED) { - printk(KERN_ERR "%s: status=%x\n", __func__, - be32_to_cpu(card->rx_chain.head->dmac_cmd_status)); - printk(KERN_ERR "%s: nextphy=%x\n", __func__, - be32_to_cpu(card->rx_chain.head->next_descr_addr)); - printk(KERN_ERR "%s: head=%p\n", __func__, - card->rx_chain.head); + if (debug_build && must be at the en of first line + && (gelic_descr_get_status(card->rx_chain.head) + != GELIC_DESCR_DMA_CARDOWNED)) { != must be at end of previous line + dev_err(dev, "%s:%d: status=%x\n", __func__, __LINE__, + be32_to_cpu( + card->rx_chain.head->hw_regs.dmac_cmd_status)); Alignment should match open parentesis. And lines can be 100 chars long when nécessary + dev_err(dev, "%s:%d: nextphy=%x\n", __func__, __LINE__, + be32_to_cpu( + card->rx_chain.head->hw_regs.next_descr_addr)); + dev_err(dev, "%s:%d: head=%px\n", __func__, __LINE__, + card->rx_chain.head); } -#endif + status = lv1_net_start_rx_dma(bus_id(card), dev_id(card), - card->rx_chain.head->bus_addr, 0); - if (status) - dev_info(ctodev(card), -"lv1_net_start_rx_dma failed, status=%d\n", status); + card->rx_chain.head->link.cpu_addr, 0); + + if (status) { No { } for single lines + dev_err(dev, "%s:%d: lv1_net_start_rx_dma failed: %d\n", + __func__, __LINE__, status); + } } /** @@ -193,11 +205,11 @@ static void gelic_card_disable_rxdmac(struct gelic_card *card) * in the status */ static void gelic_descr_set_status(struct gelic_descr *descr, - enum gelic_descr_dma_status status) + enum gelic_descr_dma_status status) { - descr->dmac_cmd_status = cpu_to_be32(status |
[PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active
During Live Partition Migration (LPM), it is observed that perf counter values reports zero post migration completion. However 'perf stat' with workload continues to show counts post migration since PMU gets disabled/enabled during sched switches. But incase of system/cpu wide monitoring, zero counts were reported with 'perf stat' after migration completion. Example: ./perf stat -e r1001e -I 1000 time counts unit events 1.001010437 22,137,414 r1001e 2.002495447 15,455,821 r1001e <<>> As seen in next below logs, the counter values shows zero after migration is completed. <<>> 86.142535370129,392,333,440 r1001e 87.144714617 0 r1001e 88.146526636 0 r1001e 89.148085029 0 r1001e Here PMU is enabled during start of perf session and counter values are read at intervals. Counters are only disabled at the end of session. The powerpc mobility code presently does not handle disabling and enabling back of PMU counters during partition migration. Also since the PMU register values are not saved/restored during migration, PMU registers like Monitor Mode Control Register 0 (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain the value it was programmed with. Hence PMU counters will not be enabled correctly post migration. Fix this in mobility code by handling disabling and enabling of PMU in all cpu's before and after migration. Patch introduces two functions 'mobility_pmu_disable' and 'mobility_pmu_enable'. mobility_pmu_disable() is called before the processor threads goes to suspend state so as to disable the PMU counters. And disable is done only if there are any active events running on that cpu. mobility_pmu_enable() is called after the processor threads are back online to enable back the PMU counters. Since the performance Monitor counters ( PMCs) are not saved/restored during LPM, results in PMC value being zero and the 'event->hw.prev_count' being non-zero value. This causes problem during updation of event->count since we always accumulate (event->hw.prev_count - PMC value) in event->count. If event->hw.prev_count is greater PMC value, event->count becomes negative. Fix this by re-initialising 'prev_count' also for all events while enabling back the events. A new variable 'migrate' is introduced in 'struct cpu_hw_event' to achieve this for LPM cases in power_pmu_enable. Use the 'migrate' value to clear the PMC index (stored in event->hw.idx) for all events so that event count settings will get re-initialised correctly. Signed-off-by: Athira Rajeev [ Fixed compilation error reported by kernel test robot ] Reported-by: kernel test robot --- Change from v1 -> v2: - Moved the mobility_pmu_enable and mobility_pmu_disable declarations under CONFIG_PPC_PERF_CTRS in rtas.h. Also included 'asm/rtas.h' in core-book3s to fix the compilation warning reported by kernel test robot. arch/powerpc/include/asm/rtas.h | 8 ++ arch/powerpc/perf/core-book3s.c | 44 --- arch/powerpc/platforms/pseries/mobility.c | 4 +++ 3 files changed, 53 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 9dc97d2..cea72d7 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -380,5 +380,13 @@ static inline void rtas_initialize(void) { } static inline void read_24x7_sys_info(void) { } #endif +#ifdef CONFIG_PPC_PERF_CTRS +void mobility_pmu_disable(void); +void mobility_pmu_enable(void); +#else +static inline void mobility_pmu_disable(void) { } +static inline void mobility_pmu_enable(void) { } +#endif + #endif /* __KERNEL__ */ #endif /* _POWERPC_RTAS_H */ diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index bb0ee71..90da7fa 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -18,6 +18,7 @@ #include #include #include +#include #ifdef CONFIG_PPC64 #include "internal.h" @@ -58,6 +59,7 @@ struct cpu_hw_events { /* Store the PMC values */ unsigned long pmcs[MAX_HWEVENTS]; + int migrate; }; static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events); @@ -1335,6 +1337,22 @@ static void power_pmu_disable(struct pmu *pmu) } /* + * Called from powerpc mobility code + * before migration to disable counters + * if the PMU is active. + */ +void mobility_pmu_disable(void) +{ + struct cpu_hw_events *cpuhw; + + cpuhw = this_cpu_ptr(&cpu_hw_events); + if (cpuhw->n_events != 0) { + power_pmu_disable(NULL); + cpuhw->migrate = 1; + } +} + +/* * Re-enable all events if disable == 0. * If we were previously disabled and events were added, then * put the new config on the PMU. @@ -1379,8 +1397,10 @@ static void power_pmu_enable(struct pmu *pmu) * no need to rec
[PATCH] arch:powerpc:platforms:powernv: cleanup __FUNCTION__ usage
__FUNCTION__ exists only for backwards compatibility reasons with old gcc versions. Replace it with __func__. Signed-off-by: Dwaipayan Ray --- arch/powerpc/platforms/powernv/opal-imc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c index 7824cc364bc4..5bafbf34e820 100644 --- a/arch/powerpc/platforms/powernv/opal-imc.c +++ b/arch/powerpc/platforms/powernv/opal-imc.c @@ -211,7 +211,7 @@ static void disable_core_pmu_counters(void) get_hard_smp_processor_id(cpu)); if (rc) pr_err("%s: Failed to stop Core (cpu = %d)\n", - __FUNCTION__, cpu); + __func__, cpu); } put_online_cpus(); } -- 2.28.0
[PATCH] arch:powerpc:mm: cleanup __FUNCTION__ usage
__FUNCTION__ exists only for backwards compatibility reasons with old gcc versions. Replace it with __func__. Signed-off-by: Dwaipayan Ray --- arch/powerpc/mm/numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index f2bf98bdcea2..3d3f570c925c 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1236,7 +1236,7 @@ int find_and_online_cpu_nid(int cpu) #endif } - pr_debug("%s:%d cpu %d nid %d\n", __FUNCTION__, __LINE__, + pr_debug("%s:%d cpu %d nid %d\n", __func__, __LINE__, cpu, new_nid); return new_nid; } -- 2.28.0
Re: [PATCH] powerpc/perf: Enable PMU counters post partition migration if PMU is active
Hi Athira, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on powerpc/next] [also build test WARNING on v5.13 next-20210709] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Athira-Rajeev/powerpc-perf-Enable-PMU-counters-post-partition-migration-if-PMU-is-active/20210711-150741 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-allyesconfig (attached as .config) compiler: powerpc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/2050c82afb3abd9eaa57fee45e71e7fccabfb81f git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Athira-Rajeev/powerpc-perf-Enable-PMU-counters-post-partition-migration-if-PMU-is-active/20210711-150741 git checkout 2050c82afb3abd9eaa57fee45e71e7fccabfb81f # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> arch/powerpc/perf/core-book3s.c:1343:6: warning: no previous prototype for >> 'mobility_pmu_disable' [-Wmissing-prototypes] 1343 | void mobility_pmu_disable(void) | ^~~~ >> arch/powerpc/perf/core-book3s.c:1537:6: warning: no previous prototype for >> 'mobility_pmu_enable' [-Wmissing-prototypes] 1537 | void mobility_pmu_enable(void) | ^~~ vim +/mobility_pmu_disable +1343 arch/powerpc/perf/core-book3s.c 1337 1338 /* 1339 * Called from powerpc mobility code 1340 * before migration to disable counters 1341 * if the PMU is active. 1342 */ > 1343 void mobility_pmu_disable(void) 1344 { 1345 struct cpu_hw_events *cpuhw; 1346 1347 cpuhw = this_cpu_ptr(&cpu_hw_events); 1348 if (cpuhw->n_events != 0) { 1349 power_pmu_disable(NULL); 1350 cpuhw->migrate = 1; 1351 } 1352 } 1353 1354 /* 1355 * Re-enable all events if disable == 0. 1356 * If we were previously disabled and events were added, then 1357 * put the new config on the PMU. 1358 */ 1359 static void power_pmu_enable(struct pmu *pmu) 1360 { 1361 struct perf_event *event; 1362 struct cpu_hw_events *cpuhw; 1363 unsigned long flags; 1364 long i; 1365 unsigned long val, mmcr0; 1366 s64 left; 1367 unsigned int hwc_index[MAX_HWEVENTS]; 1368 int n_lim; 1369 int idx; 1370 bool ebb; 1371 1372 if (!ppmu) 1373 return; 1374 local_irq_save(flags); 1375 1376 cpuhw = this_cpu_ptr(&cpu_hw_events); 1377 if (!cpuhw->disabled) 1378 goto out; 1379 1380 if (cpuhw->n_events == 0) { 1381 ppc_set_pmu_inuse(0); 1382 goto out; 1383 } 1384 1385 cpuhw->disabled = 0; 1386 1387 /* 1388 * EBB requires an exclusive group and all events must have the EBB 1389 * flag set, or not set, so we can just check a single event. Also we 1390 * know we have at least one event. 1391 */ 1392 ebb = is_ebb_event(cpuhw->event[0]); 1393 1394 /* 1395 * If we didn't change anything, or only removed events, 1396 * no need to recalculate MMCR* settings and reset the PMCs. 1397 * Just reenable the PMU with the current MMCR* settings 1398 * (possibly updated for removal of events). 1399 * While reenabling PMU during partition migration, continue 1400 * with normal flow. 1401 */ 1402 if (!cpuhw->n_added && !cpuhw->migrate) { 1403 mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE); 1404 mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1); 1405 if (ppmu->flags & PPMU_ARCH_31) 1406 mtspr(SPRN_MMCR3, cpuhw->mmcr.mmcr3); 1407 goto out_enable; 1408 } 1409 1410 /* 1411 * Clear all MMCR settings and recompute them for the new set of events. 1412 */ 1413 memset(&cpuh
[PATCH] powerpc/perf: Enable PMU counters post partition migration if PMU is active
During Live Partition Migration (LPM), it is observed that after migration completion, perf counter values reports 0 incase of system/cpu wide monitoring. However 'perf stat' with workload continues to show counts post migration since PMU gets disabled/enabled during sched switches. Example: ./perf stat -e r1001e -I 1000 time counts unit events 1.001010437 22,137,414 r1001e 2.002495447 15,455,821 r1001e <<>> As seen in next below logs, the counter values shows zero after migration is completed. <<>> 86.142535370129,392,333,440 r1001e 87.144714617 0 r1001e 88.146526636 0 r1001e 89.148085029 0 r1001e Here PMU is enabled during start of perf session and counter values are read at intervals. Counters are only disabled at the end of session. The powerpc mobility code presently does not handle disabling and enabling back of PMU counters during partition migration. Also since the PMU register values are not saved/restored during migration, PMU registers like Monitor Mode Control Register 0 (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain the value it was programmed with. Hence PMU counters will not be enabled correctly post migration. Fix this in mobility code by handling disabling and enabling of PMU in all cpu's before and after migration. Patch introduces two functions 'mobility_pmu_disable' and 'mobility_pmu_enable'. mobility_pmu_disable() is called before the processor threads goes to suspend state so as to disable the PMU counters. And disable is done only if there are any active events running on that cpu. mobility_pmu_enable() is called after the processor threads are back online to enable back the PMU counters. Since the performance Monitor counters ( PMCs) are not saved/restored during LPM, results in PMC value being zero and the 'event->hw.prev_count' being non-zero value. This causes problem during updation of event->count since we always accumulate (event->hw.prev_count - PMC value) in event->count. If event->hw.prev_count is greater PMC value, event->count becomes negative. Fix this by re-initialising 'prev_count' also for all events while enabling back the events. A new variable 'migrate' is introduced in 'struct cpu_hw_event' to achieve this for LPM cases in power_pmu_enable. Use the 'migrate' value to clear the PMC index (stored in event->hw.idx) for all events so that event count settings will get re-initialised correctly. Signed-off-by: Athira Rajeev --- arch/powerpc/include/asm/rtas.h | 4 +++ arch/powerpc/perf/core-book3s.c | 43 --- arch/powerpc/platforms/pseries/mobility.c | 4 +++ 3 files changed, 48 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 9dc97d2..3fc478a 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -376,8 +376,12 @@ static inline void rtas_initialize(void) { } #ifdef CONFIG_HV_PERF_CTRS void read_24x7_sys_info(void); +void mobility_pmu_disable(void); +void mobility_pmu_enable(void); #else static inline void read_24x7_sys_info(void) { } +static inline void mobility_pmu_disable(void) { } +static inline void mobility_pmu_enable(void) { } #endif #endif /* __KERNEL__ */ diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index bb0ee71..e86df45 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -58,6 +58,7 @@ struct cpu_hw_events { /* Store the PMC values */ unsigned long pmcs[MAX_HWEVENTS]; + int migrate; }; static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events); @@ -1335,6 +1336,22 @@ static void power_pmu_disable(struct pmu *pmu) } /* + * Called from powerpc mobility code + * before migration to disable counters + * if the PMU is active. + */ +void mobility_pmu_disable(void) +{ + struct cpu_hw_events *cpuhw; + + cpuhw = this_cpu_ptr(&cpu_hw_events); + if (cpuhw->n_events != 0) { + power_pmu_disable(NULL); + cpuhw->migrate = 1; + } +} + +/* * Re-enable all events if disable == 0. * If we were previously disabled and events were added, then * put the new config on the PMU. @@ -1379,8 +1396,10 @@ static void power_pmu_enable(struct pmu *pmu) * no need to recalculate MMCR* settings and reset the PMCs. * Just reenable the PMU with the current MMCR* settings * (possibly updated for removal of events). +* While reenabling PMU during partition migration, continue +* with normal flow. */ - if (!cpuhw->n_added) { + if (!cpuhw->n_added && !cpuhw->migrate) { mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE); mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1); if (ppmu->flags & PPMU_ARCH_31