Re: [RFC PATCH v0 1/1] powerpc/percpu: Use 2MB atom_size in percpu allocator on radix

2021-07-11 Thread Bharata B Rao
On Mon, Jul 12, 2021 at 01:00:10PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of July 8, 2021 3:29 pm:
> > The atom_size used by percpu allocator on powerpc is currently
> > determined by mmu_linear_psize which is initialized to 4K and
> > mmu_linear_psize is modified only by hash. Till now for radix
> > the atom_size was defaulting to PAGE_SIZE(64K).
> 
> Looks like it was 1MB to me?

Was it hash? Because atom_size will get set to 1MB on hash.
And both on baremetal and KVM radix, I see 64K atom_size.

> 
> > Go for 2MB
> > atom_size on radix if support for 2MB pages exist.
> > 
> > 2MB atom_size on radix will allow using PMD mappings in the
> > vmalloc area if and when support for higher sized vmalloc
> > mappings is enabled for the pecpu allocator. However right now
> 
> That would be nice.
> 
> > this change will result in more number of units to be allocated
> > within one allocation due to increased upa(units per allocation).
> 
> In that case is there any reason to do it until then?

Not strictly. I observed a similar setting on x86 which has
been there for long, so was just checking if it makes sense
here too.

> 
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  arch/powerpc/kernel/setup_64.c | 34 +-
> >  1 file changed, 25 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > index 1ff258f6c76c..45ce2d6e8112 100644
> > --- a/arch/powerpc/kernel/setup_64.c
> > +++ b/arch/powerpc/kernel/setup_64.c
> > @@ -871,6 +871,30 @@ static void __init pcpu_populate_pte(unsigned long 
> > addr)
> >   __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
> >  }
> >  
> > +static size_t pcpu_atom_size(void)
> > +{
> > +   size_t atom_size = PAGE_SIZE;
> > +
> > +   /*
> > +* Radix: Use PAGE_SIZE by default or 2M if available.
> > +*/
> > +   if (radix_enabled()) {
> > +   if (mmu_psize_defs[MMU_PAGE_2M].shift)
> > +   atom_size = 1 << mmu_psize_defs[MMU_PAGE_2M].shift;
> 
> Looks like this changes behaviour for radix.

Yes, it does as it increases the atom_size which results in higher
upa as noted. Did you mean some other behaviour change?

> 
> Also mmu_psize_defs is a pretty horrible interface you only need it in 
> some low level instruction encodings. You already explicitly know it's
> 2MB there, so you can just PMD_SHIFT.

Ok.

> 
> If you want to know whether huge PMD is supported and enabled in vmalloc
> memory, you would have to add some check which also accounts for
> vmap_allow_huge, so that would be another patch.

Yes makes sense if we want to tie the setting of higher atom_size
to actual availability of PMD mappings in vmalloc.

Regards,
Bharata.


Re: [RFC PATCH 10/43] powerpc/64s: Always set PMU control registers to frozen/disabled when not in use

2021-07-11 Thread Madhavan Srinivasan



On 7/2/21 5:57 AM, Nicholas Piggin wrote:

Excerpts from Madhavan Srinivasan's message of July 1, 2021 11:17 pm:

On 6/22/21 4:27 PM, Nicholas Piggin wrote:

KVM PMU management code looks for particular frozen/disabled bits in
the PMU registers so it knows whether it must clear them when coming
out of a guest or not. Setting this up helps KVM make these optimisations
without getting confused. Longer term the better approach might be to
move guest/host PMU switching to the perf subsystem.

Signed-off-by: Nicholas Piggin 
---
   arch/powerpc/kernel/cpu_setup_power.c | 4 ++--
   arch/powerpc/kernel/dt_cpu_ftrs.c | 6 +++---
   arch/powerpc/kvm/book3s_hv.c  | 5 +
   arch/powerpc/perf/core-book3s.c   | 7 +++
   4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_power.c 
b/arch/powerpc/kernel/cpu_setup_power.c
index a29dc8326622..3dc61e203f37 100644
--- a/arch/powerpc/kernel/cpu_setup_power.c
+++ b/arch/powerpc/kernel/cpu_setup_power.c
@@ -109,7 +109,7 @@ static void init_PMU_HV_ISA207(void)
   static void init_PMU(void)
   {
mtspr(SPRN_MMCRA, 0);
-   mtspr(SPRN_MMCR0, 0);
+   mtspr(SPRN_MMCR0, MMCR0_FC);

Sticky point here is, currently if not frozen, pmc5/6 will
keep countering. And not freezing them at boot is quiet useful
sometime, like say when running in a simulation where we could calculate
approx CPIs for micro benchmarks without perf subsystem.


Sorry for a delayed response


You even can't use the sysfs files in this sim environment? In that case


Yes it is possible. Just that we need to have additional patch to maintain
for the sim.

what if we added a boot option that could set some things up? In that
case possibly you could even gather some more types of events too.


Having a boot option will be a over-kill :). I guess we could have
this under config option?



Thanks,
Nick


Re: [RFC PATCH v0 1/1] powerpc/percpu: Use 2MB atom_size in percpu allocator on radix

2021-07-11 Thread Nicholas Piggin
Excerpts from Bharata B Rao's message of July 8, 2021 3:29 pm:
> The atom_size used by percpu allocator on powerpc is currently
> determined by mmu_linear_psize which is initialized to 4K and
> mmu_linear_psize is modified only by hash. Till now for radix
> the atom_size was defaulting to PAGE_SIZE(64K).

Looks like it was 1MB to me?

> Go for 2MB
> atom_size on radix if support for 2MB pages exist.
> 
> 2MB atom_size on radix will allow using PMD mappings in the
> vmalloc area if and when support for higher sized vmalloc
> mappings is enabled for the pecpu allocator. However right now

That would be nice.

> this change will result in more number of units to be allocated
> within one allocation due to increased upa(units per allocation).

In that case is there any reason to do it until then?

> 
> Signed-off-by: Bharata B Rao 
> ---
>  arch/powerpc/kernel/setup_64.c | 34 +-
>  1 file changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 1ff258f6c76c..45ce2d6e8112 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -871,6 +871,30 @@ static void __init pcpu_populate_pte(unsigned long addr)
> __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
>  }
>  
> +static size_t pcpu_atom_size(void)
> +{
> + size_t atom_size = PAGE_SIZE;
> +
> + /*
> +  * Radix: Use PAGE_SIZE by default or 2M if available.
> +  */
> + if (radix_enabled()) {
> + if (mmu_psize_defs[MMU_PAGE_2M].shift)
> + atom_size = 1 << mmu_psize_defs[MMU_PAGE_2M].shift;

Looks like this changes behaviour for radix.

Also mmu_psize_defs is a pretty horrible interface you only need it in 
some low level instruction encodings. You already explicitly know it's
2MB there, so you can just PMD_SHIFT.

If you want to know whether huge PMD is supported and enabled in vmalloc
memory, you would have to add some check which also accounts for
vmap_allow_huge, so that would be another patch.

Thanks,
Nick

> + goto out;
> + }
> +
> + /*
> +  * Hash: Linear mapping is one of 4K, 1M and 16M.  For 4K, no need
> +  * to group units.  For larger mappings, use 1M atom which
> +  * should be large enough to contain a number of units.
> +  */
> + if (mmu_linear_psize != MMU_PAGE_4K)
> + atom_size = 1 << 20;
> +
> +out:
> + return atom_size;
> +}
>  
>  void __init setup_per_cpu_areas(void)
>  {
> @@ -880,15 +904,7 @@ void __init setup_per_cpu_areas(void)
>   unsigned int cpu;
>   int rc = -EINVAL;
>  
> - /*
> -  * Linear mapping is one of 4K, 1M and 16M.  For 4K, no need
> -  * to group units.  For larger mappings, use 1M atom which
> -  * should be large enough to contain a number of units.
> -  */
> - if (mmu_linear_psize == MMU_PAGE_4K)
> - atom_size = PAGE_SIZE;
> - else
> - atom_size = 1 << 20;
> + atom_size = pcpu_atom_size();
>  
>   if (pcpu_chosen_fc != PCPU_FC_PAGE) {
>   rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, 
> pcpu_cpu_distance,
> -- 
> 2.31.1
> 
> 


Re: [RFC PATCH 27/43] KVM: PPC: Book3S HV P9: Move host OS save/restore functions to built-in

2021-07-11 Thread Nicholas Piggin
Excerpts from Athira Rajeev's message of July 8, 2021 3:32 pm:
> 
> 
>> On 22-Jun-2021, at 4:27 PM, Nicholas Piggin  wrote:
>> 
>> Move the P9 guest/host register switching functions to the built-in
>> P9 entry code, and export it for nested to use as well.
>> 
>> This allows more flexibility in scheduling these supervisor privileged
>> SPR accesses with the HV privileged and PR SPR accesses in the low level
>> entry code.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>> arch/powerpc/kvm/book3s_hv.c  | 351 +-
>> arch/powerpc/kvm/book3s_hv.h  |  39 +++
>> arch/powerpc/kvm/book3s_hv_p9_entry.c | 332 
>> 3 files changed, 372 insertions(+), 350 deletions(-)
>> create mode 100644 arch/powerpc/kvm/book3s_hv.h
>> 
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 35749b0b663f..a7660af22161 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -79,6 +79,7 @@
>> #include 
>> 
>> #include "book3s.h"
>> +#include "book3s_hv.h"
>> 
>> #define CREATE_TRACE_POINTS
>> #include "trace_hv.h"
>> @@ -3675,356 +3676,6 @@ static noinline void kvmppc_run_core(struct 
>> kvmppc_vcore *vc)
>>  trace_kvmppc_run_core(vc, 1);
>> }
>> 
>> -/*
>> - * Privileged (non-hypervisor) host registers to save.
>> - */
>> -struct p9_host_os_sprs {
>> -unsigned long dscr;
>> -unsigned long tidr;
>> -unsigned long iamr;
>> -unsigned long amr;
>> -unsigned long fscr;
>> -
>> -unsigned int pmc1;
>> -unsigned int pmc2;
>> -unsigned int pmc3;
>> -unsigned int pmc4;
>> -unsigned int pmc5;
>> -unsigned int pmc6;
>> -unsigned long mmcr0;
>> -unsigned long mmcr1;
>> -unsigned long mmcr2;
>> -unsigned long mmcr3;
>> -unsigned long mmcra;
>> -unsigned long siar;
>> -unsigned long sier1;
>> -unsigned long sier2;
>> -unsigned long sier3;
>> -unsigned long sdar;
>> -};
>> -
>> -static void freeze_pmu(unsigned long mmcr0, unsigned long mmcra)
>> -{
>> -if (!(mmcr0 & MMCR0_FC))
>> -goto do_freeze;
>> -if (mmcra & MMCRA_SAMPLE_ENABLE)
>> -goto do_freeze;
>> -if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>> -if (!(mmcr0 & MMCR0_PMCCEXT))
>> -goto do_freeze;
>> -if (!(mmcra & MMCRA_BHRB_DISABLE))
>> -goto do_freeze;
>> -}
>> -return;
>> -
>> -do_freeze:
>> -mmcr0 = MMCR0_FC;
>> -mmcra = 0;
>> -if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>> -mmcr0 |= MMCR0_PMCCEXT;
>> -mmcra = MMCRA_BHRB_DISABLE;
>> -}
>> -
>> -mtspr(SPRN_MMCR0, mmcr0);
>> -mtspr(SPRN_MMCRA, mmcra);
>> -isync();
>> -}
>> -
>> -static void switch_pmu_to_guest(struct kvm_vcpu *vcpu,
>> -struct p9_host_os_sprs *host_os_sprs)
>> -{
>> -struct lppaca *lp;
>> -int load_pmu = 1;
>> -
>> -lp = vcpu->arch.vpa.pinned_addr;
>> -if (lp)
>> -load_pmu = lp->pmcregs_in_use;
>> -
>> -if (load_pmu)
>> -  vcpu->arch.hfscr |= HFSCR_PM;
>> -
>> -/* Save host */
>> -if (ppc_get_pmu_inuse()) {
>> -/*
>> - * It might be better to put PMU handling (at least for the
>> - * host) in the perf subsystem because it knows more about what
>> - * is being used.
>> - */
>> -
>> -/* POWER9, POWER10 do not implement HPMC or SPMC */
>> -
>> -host_os_sprs->mmcr0 = mfspr(SPRN_MMCR0);
>> -host_os_sprs->mmcra = mfspr(SPRN_MMCRA);
>> -
>> -freeze_pmu(host_os_sprs->mmcr0, host_os_sprs->mmcra);
>> -
>> -host_os_sprs->pmc1 = mfspr(SPRN_PMC1);
>> -host_os_sprs->pmc2 = mfspr(SPRN_PMC2);
>> -host_os_sprs->pmc3 = mfspr(SPRN_PMC3);
>> -host_os_sprs->pmc4 = mfspr(SPRN_PMC4);
>> -host_os_sprs->pmc5 = mfspr(SPRN_PMC5);
>> -host_os_sprs->pmc6 = mfspr(SPRN_PMC6);
>> -host_os_sprs->mmcr1 = mfspr(SPRN_MMCR1);
>> -host_os_sprs->mmcr2 = mfspr(SPRN_MMCR2);
>> -host_os_sprs->sdar = mfspr(SPRN_SDAR);
>> -host_os_sprs->siar = mfspr(SPRN_SIAR);
>> -host_os_sprs->sier1 = mfspr(SPRN_SIER);
>> -
>> -if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>> -host_os_sprs->mmcr3 = mfspr(SPRN_MMCR3);
>> -host_os_sprs->sier2 = mfspr(SPRN_SIER2);
>> -host_os_sprs->sier3 = mfspr(SPRN_SIER3);
>> -}
>> -}
>> -
>> -#ifdef CONFIG_PPC_PSERIES
>> -if (kvmhv_on_pseries()) {
>> -if (vcpu->arch.vpa.pinned_addr) {
>> -struct lppaca *lp = vcpu->arch.vpa.pinned_addr;
>> -get_lppaca()->pmcregs_in_use = lp->pmcregs_in_use;
>> -} else {
>> -get_lppaca()->pmcregs_in_use = 1;
>> -}
>> -}
>> -#endif
>> -
>> -/* Load guest */
>> -i

Re: [RFC PATCH 11/43] KVM: PPC: Book3S HV P9: Implement PMU save/restore in C

2021-07-11 Thread Nicholas Piggin
Excerpts from Athira Rajeev's message of July 10, 2021 12:47 pm:
> 
> 
>> On 22-Jun-2021, at 4:27 PM, Nicholas Piggin  wrote:
>> 
>> Implement the P9 path PMU save/restore code in C, and remove the
>> POWER9/10 code from the P7/8 path assembly.
>> 
>> -449 cycles (8533) POWER9 virt-mode NULL hcall
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>> arch/powerpc/include/asm/asm-prototypes.h |   5 -
>> arch/powerpc/kvm/book3s_hv.c  | 205 --
>> arch/powerpc/kvm/book3s_hv_interrupts.S   |  13 +-
>> arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  43 +
>> 4 files changed, 200 insertions(+), 66 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
>> b/arch/powerpc/include/asm/asm-prototypes.h
>> index 02ee6f5ac9fe..928db8ef9a5a 100644
>> --- a/arch/powerpc/include/asm/asm-prototypes.h
>> +++ b/arch/powerpc/include/asm/asm-prototypes.h
>> @@ -136,11 +136,6 @@ static inline void kvmppc_restore_tm_hv(struct kvm_vcpu 
>> *vcpu, u64 msr,
>>  bool preserve_nv) { }
>> #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
>> 
>> -void kvmhv_save_host_pmu(void);
>> -void kvmhv_load_host_pmu(void);
>> -void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use);
>> -void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu);
>> -
>> void kvmppc_p9_enter_guest(struct kvm_vcpu *vcpu);
>> 
>> long kvmppc_h_set_dabr(struct kvm_vcpu *vcpu, unsigned long dabr);
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index f7349d150828..b1b94b3563b7 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -3635,6 +3635,188 @@ static noinline void kvmppc_run_core(struct 
>> kvmppc_vcore *vc)
>>  trace_kvmppc_run_core(vc, 1);
>> }
>> 
>> +/*
>> + * Privileged (non-hypervisor) host registers to save.
>> + */
>> +struct p9_host_os_sprs {
>> +unsigned long dscr;
>> +unsigned long tidr;
>> +unsigned long iamr;
>> +unsigned long amr;
>> +unsigned long fscr;
>> +
>> +unsigned int pmc1;
>> +unsigned int pmc2;
>> +unsigned int pmc3;
>> +unsigned int pmc4;
>> +unsigned int pmc5;
>> +unsigned int pmc6;
>> +unsigned long mmcr0;
>> +unsigned long mmcr1;
>> +unsigned long mmcr2;
>> +unsigned long mmcr3;
>> +unsigned long mmcra;
>> +unsigned long siar;
>> +unsigned long sier1;
>> +unsigned long sier2;
>> +unsigned long sier3;
>> +unsigned long sdar;
>> +};
>> +
>> +static void freeze_pmu(unsigned long mmcr0, unsigned long mmcra)
>> +{
>> +if (!(mmcr0 & MMCR0_FC))
>> +goto do_freeze;
>> +if (mmcra & MMCRA_SAMPLE_ENABLE)
>> +goto do_freeze;
>> +if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>> +if (!(mmcr0 & MMCR0_PMCCEXT))
>> +goto do_freeze;
>> +if (!(mmcra & MMCRA_BHRB_DISABLE))
>> +goto do_freeze;
>> +}
>> +return;
> 
> 
> Hi Nick
> 
> When freezing the PMU, do we need to also set pmcregs_in_use to zero ?

Not immediately, we still need to save out the values of the PMU 
registers. If we clear pmcregs_in_use, then our hypervisor can discard 
the contents of those registers at any time.

> Also, why we need these above conditions like MMCRA_SAMPLE_ENABLE,  
> MMCR0_PMCCEXT checks also before freezing ?

Basically just because that's the condition we wnat to set the PMU to 
before entering the guest if the guest does not have its own PMU 
registers in use.

I'm not entirely sure this is correct / optimal for perf though, so we
should double check that. I think some of this stuff should be wrapped 
up and put into perf/ subdirectory as I've said before. KVM shouldn't 
need to know about exactly how PMU is to be set up and managed by
perf, we should just be able to ask perf to save/restore/switch state.

Thanks,
Nick


Re: [RFC PATCH 10/43] powerpc/64s: Always set PMU control registers to frozen/disabled when not in use

2021-07-11 Thread Nicholas Piggin
Excerpts from Athira Rajeev's message of July 10, 2021 12:50 pm:
> 
> 
>> On 22-Jun-2021, at 4:27 PM, Nicholas Piggin  wrote:
>> 
>> KVM PMU management code looks for particular frozen/disabled bits in
>> the PMU registers so it knows whether it must clear them when coming
>> out of a guest or not. Setting this up helps KVM make these optimisations
>> without getting confused. Longer term the better approach might be to
>> move guest/host PMU switching to the perf subsystem.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>> arch/powerpc/kernel/cpu_setup_power.c | 4 ++--
>> arch/powerpc/kernel/dt_cpu_ftrs.c | 6 +++---
>> arch/powerpc/kvm/book3s_hv.c  | 5 +
>> arch/powerpc/perf/core-book3s.c   | 7 +++
>> 4 files changed, 17 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/powerpc/kernel/cpu_setup_power.c 
>> b/arch/powerpc/kernel/cpu_setup_power.c
>> index a29dc8326622..3dc61e203f37 100644
>> --- a/arch/powerpc/kernel/cpu_setup_power.c
>> +++ b/arch/powerpc/kernel/cpu_setup_power.c
>> @@ -109,7 +109,7 @@ static void init_PMU_HV_ISA207(void)
>> static void init_PMU(void)
>> {
>>  mtspr(SPRN_MMCRA, 0);
>> -mtspr(SPRN_MMCR0, 0);
>> +mtspr(SPRN_MMCR0, MMCR0_FC);
>>  mtspr(SPRN_MMCR1, 0);
>>  mtspr(SPRN_MMCR2, 0);
>> }
>> @@ -123,7 +123,7 @@ static void init_PMU_ISA31(void)
>> {
>>  mtspr(SPRN_MMCR3, 0);
>>  mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
>> -mtspr(SPRN_MMCR0, MMCR0_PMCCEXT);
>> +mtspr(SPRN_MMCR0, MMCR0_FC | MMCR0_PMCCEXT);
>> }
>> 
>> /*
>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
>> b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> index 0a6b36b4bda8..06a089fbeaa7 100644
>> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
>> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> @@ -353,7 +353,7 @@ static void init_pmu_power8(void)
>>  }
>> 
>>  mtspr(SPRN_MMCRA, 0);
>> -mtspr(SPRN_MMCR0, 0);
>> +mtspr(SPRN_MMCR0, MMCR0_FC);
>>  mtspr(SPRN_MMCR1, 0);
>>  mtspr(SPRN_MMCR2, 0);
>>  mtspr(SPRN_MMCRS, 0);
>> @@ -392,7 +392,7 @@ static void init_pmu_power9(void)
>>  mtspr(SPRN_MMCRC, 0);
>> 
>>  mtspr(SPRN_MMCRA, 0);
>> -mtspr(SPRN_MMCR0, 0);
>> +mtspr(SPRN_MMCR0, MMCR0_FC);
>>  mtspr(SPRN_MMCR1, 0);
>>  mtspr(SPRN_MMCR2, 0);
>> }
>> @@ -428,7 +428,7 @@ static void init_pmu_power10(void)
>> 
>>  mtspr(SPRN_MMCR3, 0);
>>  mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
>> -mtspr(SPRN_MMCR0, MMCR0_PMCCEXT);
>> +mtspr(SPRN_MMCR0, MMCR0_FC | MMCR0_PMCCEXT);
>> }
>> 
>> static int __init feat_enable_pmu_power10(struct dt_cpu_feature *f)
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 1f30f98b09d1..f7349d150828 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -2593,6 +2593,11 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu 
>> *vcpu)
>> #endif
>> #endif
>>  vcpu->arch.mmcr[0] = MMCR0_FC;
>> +if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>> +vcpu->arch.mmcr[0] |= MMCR0_PMCCEXT;
>> +vcpu->arch.mmcra = MMCRA_BHRB_DISABLE;
>> +}
>> +
>>  vcpu->arch.ctrl = CTRL_RUNLATCH;
>>  /* default to host PVR, since we can't spoof it */
>>  kvmppc_set_pvr_hv(vcpu, mfspr(SPRN_PVR));
>> diff --git a/arch/powerpc/perf/core-book3s.c 
>> b/arch/powerpc/perf/core-book3s.c
>> index 51622411a7cc..e33b29ec1a65 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -1361,6 +1361,13 @@ static void power_pmu_enable(struct pmu *pmu)
>>  goto out;
>> 
>>  if (cpuhw->n_events == 0) {
>> +if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>> +mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
>> +mtspr(SPRN_MMCR0, MMCR0_FC | MMCR0_PMCCEXT);
>> +} else {
>> +mtspr(SPRN_MMCRA, 0);
>> +mtspr(SPRN_MMCR0, MMCR0_FC);
>> +}
> 
> 
> Hi Nick,
> 
> We are setting these bits in “power_pmu_disable” function. And disable will 
> be called before any event gets deleted/stopped. Can you please help to 
> understand why this is needed in power_pmu_enable path also ?

I'll have to go back and check what I needed it for.

Basically what I'm trying to achieve is that when the guest and host 
are not running perf, then the registers should match. This way the host 
can test them with mfspr but does not need to fix them with mtspr.

It's not too important for performance after TM facility demand faulting
and the nested guest PMU fix means you can usually just skip that 
entirely, but I think it's cleaner. I'll double check my perf/ changes
it's possible that part should be split out or is unnecessary.

Thanks,
Nick


Re: [PATCH V3 1/1] powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC

2021-07-11 Thread Nicholas Piggin
Excerpts from Athira Rajeev's message of July 11, 2021 1:58 am:
> Running perf fuzzer showed below in dmesg logs:
> "Can't find PMC that caused IRQ"
> 
> This means a PMU exception happened, but none of the PMC's (Performance
> Monitor Counter) were found to be overflown. There are some corner cases
> that clears the PMCs after PMI gets masked. In such cases, the perf
> interrupt handler will not find the active PMC values that had caused
> the overflow and thus leads to this message while replaying.
> 
> Case 1: PMU Interrupt happens during replay of other interrupts and
> counter values gets cleared by PMU callbacks before replay:
> 
> During replay of interrupts like timer, __do_irq and doorbell exception, we
> conditionally enable interrupts via may_hard_irq_enable(). This could
> potentially create a window to generate a PMI. Since irq soft mask is set
> to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before
> perf interrupt is replayed and the PMU events could deleted or stopped.
> This will change the PMU SPR values and resets the counters. Snippet of
> ftrace log showing PMU callbacks invoked in "__do_irq":
> 
> -0 [051] dns. 132025441306354: __do_irq <-call_do_irq
> -0 [051] dns. 132025441306430: irq_enter <-__do_irq
> -0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
> -0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
> <<>>
> -0 [051] dnH. 132025441307770: 
> generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
> -0 [051] dnH. 132025441307839: flush_smp_call_function_queue 
> <-smp_ipi_demux_relaxed
> -0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
> -0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
> -0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
> -0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
> -0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
> -0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
> -0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
> -0 [051] dnH. 132025441308787: power_pmu_event_idx 
> <-perf_event_update_userpage
> -0 [051] dnH. 132025441308859: rcu_read_unlock_strict 
> <-perf_event_update_userpage
> -0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
> <<>>
> -0 [051] dnH. 132025441311108: irq_exit <-__do_irq
> -0 [051] dns. 132025441311319: performance_monitor_exception 
> <-replay_soft_interrupts
> 
> Case 2: PMI's masked during local_* operations, example local_add.
> If the local_add operation happens within a local_irq_save, replay of
> PMI will be during local_irq_restore. Similar to case 1, this could
> also create a window before replay where PMU events gets deleted or
> stopped.
> 
> Patch adds a fix to update the PMU callback function 'power_pmu_disable' to
> check for pending perf interrupt. If there is an overflown PMC and pending
> perf interrupt indicated in Paca, clear the PMI bit in paca to drop that
> sample. Clearing of PMI bit is done in 'power_pmu_disable' since disable is
> invoked before any event gets deleted/stopped. With this fix, if there are
> more than one event running in the PMU, there is a chance that we clear the
> PMI bit for the event which is not getting deleted/stopped. The other
> events may still remain active. Hence to make sure we don't drop valid
> sample in such cases, another check is added in power_pmu_enable. This
> checks if there is an overflown PMC found among the active events and if
> so enable back the PMI bit. Two new helper functions are introduced to
> clear/set the PMI, ie 'clear_pmi_irq_pending' and 'set_pmi_irq_pending'.
> 
> Also there are corner cases which results in performance monitor interrupts
> getting triggered during power_pmu_disable. This happens since PMXE bit is
> not cleared along with disabling of other MMCR0 bits in the pmu_disable.
> Such PMI's could leave the PMU running and could trigger PMI again which
> will set MMCR0 PMAO bit. This could lead to spurious interrupts in some
> corner cases. Example, a timer after power_pmu_del which will re-enable
> interrupts and triggers a PMI again since PMAO bit is still set. But fails
> to find valid overflow since PMC get cleared in power_pmu_del. Patch
> fixes this by disabling PMXE along with disabling of other MMCR0 bits
> in power_pmu_disable.
> 
> We can't just replay PMI any time. Hence this approach is preferred rather
> than replaying PMI before resetting overflown PMC. Patch also documents
> core-book3s on a race condition which can trigger these PMC messages during
> idle path in PowerNV.
> 
> Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and 
> replay them")
> Reported-by: Nageswara R Sastry 
> Suggested-by: Nicholas Piggin 
> Suggested-by: Madhavan Srinivasan 
> Signed-off-by: Athira Rajeev 
> ---
>  arch/powerpc/include/asm/hw_irq.h | 31 +
>  arch/powerpc/perf/core-book3s.c   | 49 
> ++-
>  2 files changed, 79 i

RE: [PATCH v2] fpga: dfl: fme: Fix cpu hotplug issue in performance reporting

2021-07-11 Thread Wu, Hao
> Subject: [PATCH v2] fpga: dfl: fme: Fix cpu hotplug issue in performance
> reporting
> 
> The performance reporting driver added cpu hotplug
> feature but it didn't add pmu migration call in cpu
> offline function.
> This can create an issue incase the current designated
> cpu being used to collect fme pmu data got offline,
> as based on current code we are not migrating fme pmu to
> new target cpu. Because of that perf will still try to
> fetch data from that offline cpu and hence we will not
> get counter data.
> 
> Patch fixed this issue by adding pmu_migrate_context call
> in fme_perf_offline_cpu function.
> 
> Fixes: 724142f8c42a ("fpga: dfl: fme: add performance reporting support")
> Tested-by: Xu Yilun 
> Signed-off-by: Kajol Jain 

Thanks for this fixing! And thanks Yilun for the verification too. : )

> Cc: sta...@vger.kernel.org
> ---
>  drivers/fpga/dfl-fme-perf.c | 4 
>  1 file changed, 4 insertions(+)
> 
> ---
> Changelog:
> v1 -> v2:
> - Add sta...@vger.kernel.org in cc list
> 
> RFC -> PATCH v1
> - Remove RFC tag
> - Did nits changes on subject and commit message as suggested by Xu Yilun
> - Added Tested-by tag
> - Link to rfc patch: https://lkml.org/lkml/2021/6/28/112
> ---
> diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
> index 4299145ef347..b9a54583e505 100644
> --- a/drivers/fpga/dfl-fme-perf.c
> +++ b/drivers/fpga/dfl-fme-perf.c
> @@ -953,6 +953,10 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct
> hlist_node *node)
>   return 0;
> 
>   priv->cpu = target;
> +
> + /* Migrate fme_perf pmu events to the new target cpu */

Only one minor item, this line is not needed. : )

After remove it, 
Acked-by: Wu Hao 

Thanks
Hao

> + perf_pmu_migrate_context(&priv->pmu, cpu, target);
> +
>   return 0;
>  }
> 
> --
> 2.31.1



[PATCH] KVM: PPC: Book3S HV P9: Fix guest TM support

2021-07-11 Thread Nicholas Piggin
The conversion to C introduced several bugs in TM handling that can
cause host crashes with TM bad thing interrupts. Mostly just simple
typos or missed logic in the conversion that got through due to my
not testing TM in the guest sufficiently.

- Early TM emulation for the softpatch interrupt should be done if fake
  suspend mode is _not_ active.

- Early TM emulation wants to return immediately to the guest so as to
  not doom transactions unnecessarily.

- And if exiting from the guest, the host MSR should include the TM[S]
  bit if the guest was T/S, before it is treclaimed.

After this fix, all the TM selftests pass when running on a P9 processor
that implements TM with softpatch interrupt.

Reported-by: Alexey Kardashevskiy 
Fixes: 89d35b2391015 ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 
path in C")
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 83f592eadcd2..becb0c08ebda 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -317,6 +317,9 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 */
mtspr(SPRN_HDEC, hdec);
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+tm_return_to_guest:
+#endif
mtspr(SPRN_DAR, vcpu->arch.shregs.dar);
mtspr(SPRN_DSISR, vcpu->arch.shregs.dsisr);
mtspr(SPRN_SRR0, vcpu->arch.shregs.srr0);
@@ -415,11 +418,23 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 * is in real suspend mode and is trying to transition to
 * transactional mode.
 */
-   if (local_paca->kvm_hstate.fake_suspend &&
+   if (!local_paca->kvm_hstate.fake_suspend &&
(vcpu->arch.shregs.msr & MSR_TS_S)) {
if (kvmhv_p9_tm_emulation_early(vcpu)) {
-   /* Prevent it being handled again. */
-   trap = 0;
+   /*
+* Go straight back into the guest with the
+* new NIP/MSR as set by TM emulation.
+*/
+   mtspr(SPRN_HSRR0, vcpu->arch.regs.nip);
+   mtspr(SPRN_HSRR1, vcpu->arch.shregs.msr);
+
+   /*
+* tm_return_to_guest re-loads SRR0/1, DAR,
+* DSISR after RI is cleared, in case they had
+* been clobbered by a MCE.
+*/
+   __mtmsrd(0, 1); /* clear RI */
+   goto tm_return_to_guest;
}
}
 #endif
@@ -499,6 +514,9 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 * If we are in real mode, only switch MMU on after the MMU is
 * switched to host, to avoid the P9_RADIX_PREFETCH_BUG.
 */
+   if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
+   vcpu->arch.shregs.msr & MSR_TS_MASK)
+   msr |= MSR_TS_S;
__mtmsrd(msr, 0);
 
end_timing(vcpu);
-- 
2.23.0



Re: [PATCH v3 2/2] net/ps3_gelic: Cleanups, improve logging

2021-07-11 Thread Christophe Leroy

Geoff Levand  a écrit :


General source cleanups and improved logging messages.



Describe a bit more what you do to cleanup and improve.

Some of your changes are not cleanup , they increase the mess.


You should read kernel coding style


Signed-off-by: Geoff Levand 
---
 drivers/net/ethernet/toshiba/ps3_gelic_net.c | 395 ++-
 1 file changed, 216 insertions(+), 179 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c  
b/drivers/net/ethernet/toshiba/ps3_gelic_net.c

index e01938128882..9dbcb7c4ec80 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -44,17 +44,17 @@ MODULE_AUTHOR("SCE Inc.");
 MODULE_DESCRIPTION("Gelic Network driver");
 MODULE_LICENSE("GPL");

-
-/* set irq_mask */
 int gelic_card_set_irq_mask(struct gelic_card *card, u64 mask)
 {
+   struct device *dev = ctodev(card);
int status;

status = lv1_net_set_interrupt_mask(bus_id(card), dev_id(card),
mask, 0);
-   if (status)
-   dev_info(ctodev(card),
-"%s failed %d\n", __func__, status);
+   if (status) {



No { } for single lines


+   dev_err(dev, "%s:%d failed: %d\n", __func__, __LINE__, status);
+   }
+
return status;
 }

@@ -63,6 +63,7 @@ static void gelic_card_rx_irq_on(struct gelic_card *card)
card->irq_mask |= GELIC_CARD_RXINT;
gelic_card_set_irq_mask(card, card->irq_mask);
 }
+
 static void gelic_card_rx_irq_off(struct gelic_card *card)
 {
card->irq_mask &= ~GELIC_CARD_RXINT;
@@ -70,15 +71,14 @@ static void gelic_card_rx_irq_off(struct  
gelic_card *card)

 }

 static void gelic_card_get_ether_port_status(struct gelic_card *card,
-int inform)
+   int inform)


Bad indent


 {
u64 v2;
struct net_device *ether_netdev;

lv1_net_control(bus_id(card), dev_id(card),
-   GELIC_LV1_GET_ETH_PORT_STATUS,
-   GELIC_LV1_VLAN_TX_ETHERNET_0, 0, 0,
-   &card->ether_port_status, &v2);
+   GELIC_LV1_GET_ETH_PORT_STATUS, GELIC_LV1_VLAN_TX_ETHERNET_0, 0,
+   0, &card->ether_port_status, &v2);


Bad indent



if (inform) {
ether_netdev = card->netdev[GELIC_PORT_ETHERNET_0];
@@ -105,15 +105,17 @@ gelic_descr_get_status(struct gelic_descr *descr)

 static int gelic_card_set_link_mode(struct gelic_card *card, int mode)
 {
+   struct device *dev = ctodev(card);
int status;
u64 v1, v2;

status = lv1_net_control(bus_id(card), dev_id(card),
-GELIC_LV1_SET_NEGOTIATION_MODE,
-GELIC_LV1_PHY_ETHERNET_0, mode, 0, &v1, &v2);
+   GELIC_LV1_SET_NEGOTIATION_MODE, GELIC_LV1_PHY_ETHERNET_0, mode,
+   0, &v1, &v2);
+
if (status) {
-   pr_info("%s: failed setting negotiation mode %d\n", __func__,
-   status);
+   dev_err(dev, "%s:%d: Failed setting negotiation mode: %d\n",
+   __func__, __LINE__, status);
return -EBUSY;
}

@@ -130,13 +132,16 @@ static int gelic_card_set_link_mode(struct  
gelic_card *card, int mode)

  */
 static void gelic_card_disable_txdmac(struct gelic_card *card)
 {
+   struct device *dev = ctodev(card);
int status;

/* this hvc blocks until the DMA in progress really stopped */
status = lv1_net_stop_tx_dma(bus_id(card), dev_id(card));
-   if (status)
-   dev_err(ctodev(card),
-   "lv1_net_stop_tx_dma failed, status=%d\n", status);
+
+   if (status) {
+   dev_err(dev, "%s:%d: lv1_net_stop_tx_dma failed: %d\n",
+   __func__, __LINE__, status);
+   }
 }

 /**
@@ -187,13 +192,16 @@ static void gelic_card_enable_rxdmac(struct  
gelic_card *card)

  */
 static void gelic_card_disable_rxdmac(struct gelic_card *card)
 {
+   struct device *dev = ctodev(card);
int status;

/* this hvc blocks until the DMA in progress really stopped */
status = lv1_net_stop_rx_dma(bus_id(card), dev_id(card));
-   if (status)
-   dev_err(ctodev(card),
-   "lv1_net_stop_rx_dma failed, %d\n", status);
+
+   if (status) {
+   dev_err(dev, "%s:%d: lv1_net_stop_rx_dma failed: %d\n",
+   __func__, __LINE__, status);
+   }
 }

 /**
@@ -216,6 +224,7 @@ static void gelic_descr_set_status(struct  
gelic_descr *descr,

 * Usually caller of this function wants to inform that to the
 * hardware, so we assure here the hardware sees the change.
 */
+


Bad blank line, it separates the comment from the commentee



wmb();
 }

@@ -229,8 +238,7 @@ static void gelic_descr_set_status(struct  
gelic_descr *descr,

  * and re-i

Re: [PATCH v3 1/2] net/ps3_gelic: Add gelic_descr structures

2021-07-11 Thread Christophe Leroy

Geoff Levand  a écrit :


Create two new structures, struct gelic_hw_regs and struct gelic_chain_link,
and replace the corresponding members of struct gelic_descr with the new
structures.  struct gelic_hw_regs holds the register variables used by the
gelic hardware device.  struct gelic_chain_link holds variables used to
manage the driver's linked list of gelic descr structures.

Fixes several DMA mapping problems with the PS3's gelic network driver:

 * Change from checking the return value of dma_map_single to using the
   dma_mapping_error routine.
 * Use the correct buffer length when mapping the RX skb.
 * Improved error checking and debug logging.


Your patch has a lot of cosmetic changes. Several of them are just  
wrong. The other ones belong to another patch. This patch should focus  
only on the changes it targets.


Your patch is way too big and addresses several different topics.  
Should be split in several patches.


I suggest you run checkpatch.pl --strict on your patch




Fixes runtime errors like these, and also other randomly occurring errors:

  IP-Config: Complete:
  DMA-API: ps3_gelic_driver sb_05: device driver failed to check map error
  WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:1027 .check_unmap+0x888/0x8dc

Signed-off-by: Geoff Levand 
---
 drivers/net/ethernet/toshiba/ps3_gelic_net.c | 573 +++
 drivers/net/ethernet/toshiba/ps3_gelic_net.h |  24 +-
 2 files changed, 341 insertions(+), 256 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c  
b/drivers/net/ethernet/toshiba/ps3_gelic_net.c

index 55e652624bd7..e01938128882 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -96,9 +96,11 @@ static void  
gelic_card_get_ether_port_status(struct gelic_card *card,

  * returns the status as in the dmac_cmd_status field of the descriptor
  */
 static enum gelic_descr_dma_status
+


This blank line is pointless and misleading


 gelic_descr_get_status(struct gelic_descr *descr)
 {
-   return be32_to_cpu(descr->dmac_cmd_status) & GELIC_DESCR_DMA_STAT_MASK;
+   return be32_to_cpu(descr->hw_regs.dmac_cmd_status)
+   & GELIC_DESCR_DMA_STAT_MASK;


The & should be at the end of previous line and the second line should  
be aligned to the open parenthesis



 }

 static int gelic_card_set_link_mode(struct gelic_card *card, int mode)
@@ -146,24 +148,34 @@ static void gelic_card_disable_txdmac(struct  
gelic_card *card)

  */
 static void gelic_card_enable_rxdmac(struct gelic_card *card)
 {
+   struct device *dev = ctodev(card);
int status;
+#if defined(DEBUG)
+   static const int debug_build = 1;
+#else
+   static const int debug_build = 0;
+#endif


Useless

You can directly use __is_defined(DEBUG) below



-#ifdef DEBUG
-   if (gelic_descr_get_status(card->rx_chain.head) !=
-   GELIC_DESCR_DMA_CARDOWNED) {
-   printk(KERN_ERR "%s: status=%x\n", __func__,
-  be32_to_cpu(card->rx_chain.head->dmac_cmd_status));
-   printk(KERN_ERR "%s: nextphy=%x\n", __func__,
-  be32_to_cpu(card->rx_chain.head->next_descr_addr));
-   printk(KERN_ERR "%s: head=%p\n", __func__,
-  card->rx_chain.head);
+   if (debug_build


&& must be at the en of first line


+   && (gelic_descr_get_status(card->rx_chain.head)
+   != GELIC_DESCR_DMA_CARDOWNED)) {


!= must be at end of previous line


+   dev_err(dev, "%s:%d: status=%x\n", __func__, __LINE__,
+   be32_to_cpu(
+   card->rx_chain.head->hw_regs.dmac_cmd_status));


Alignment should match open parentesis. And lines can be 100 chars  
long when nécessary




+   dev_err(dev, "%s:%d: nextphy=%x\n", __func__, __LINE__,
+   be32_to_cpu(
+   card->rx_chain.head->hw_regs.next_descr_addr));
+   dev_err(dev, "%s:%d: head=%px\n", __func__, __LINE__,
+   card->rx_chain.head);
}
-#endif
+
status = lv1_net_start_rx_dma(bus_id(card), dev_id(card),
-   card->rx_chain.head->bus_addr, 0);
-   if (status)
-   dev_info(ctodev(card),
-"lv1_net_start_rx_dma failed, status=%d\n", status);
+   card->rx_chain.head->link.cpu_addr, 0);
+
+   if (status) {


No { } for single lines



+   dev_err(dev, "%s:%d: lv1_net_start_rx_dma failed: %d\n",
+   __func__, __LINE__, status);
+   }
 }

 /**
@@ -193,11 +205,11 @@ static void gelic_card_disable_rxdmac(struct  
gelic_card *card)

  * in the status
  */
 static void gelic_descr_set_status(struct gelic_descr *descr,
-  enum gelic_descr_dma_status status)
+   enum gelic_descr_dma_status status)
 {
-   descr->dmac_cmd_status = cpu_to_be32(status |

[PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active

2021-07-11 Thread Athira Rajeev
During Live Partition Migration (LPM), it is observed that perf
counter values reports zero post migration completion. However
'perf stat' with workload continues to show counts post migration
since PMU gets disabled/enabled during sched switches. But incase
of system/cpu wide monitoring, zero counts were reported with 'perf
stat' after migration completion.

Example:
 ./perf stat -e r1001e -I 1000
   time counts unit events
 1.001010437 22,137,414  r1001e
 2.002495447 15,455,821  r1001e
<<>> As seen in next below logs, the counter values shows zero
after migration is completed.
<<>>
86.142535370129,392,333,440  r1001e
87.144714617  0  r1001e
88.146526636  0  r1001e
89.148085029  0  r1001e

Here PMU is enabled during start of perf session and counter
values are read at intervals. Counters are only disabled at the
end of session. The powerpc mobility code presently does not handle
disabling and enabling back of PMU counters during partition
migration. Also since the PMU register values are not saved/restored
during migration, PMU registers like Monitor Mode Control Register 0
(MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
the value it was programmed with. Hence PMU counters will not be
enabled correctly post migration.

Fix this in mobility code by handling disabling and enabling of
PMU in all cpu's before and after migration. Patch introduces two
functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
mobility_pmu_disable() is called before the processor threads goes
to suspend state so as to disable the PMU counters. And disable is
done only if there are any active events running on that cpu.
mobility_pmu_enable() is called after the processor threads are
back online to enable back the PMU counters.

Since the performance Monitor counters ( PMCs) are not
saved/restored during LPM, results in PMC value being zero and the
'event->hw.prev_count' being non-zero value. This causes problem
during updation of event->count since we always accumulate
(event->hw.prev_count - PMC value) in event->count.  If
event->hw.prev_count is greater PMC value, event->count becomes
negative. Fix this by re-initialising 'prev_count' also for all
events while enabling back the events. A new variable 'migrate' is
introduced in 'struct cpu_hw_event' to achieve this for LPM cases
in power_pmu_enable. Use the 'migrate' value to clear the PMC
index (stored in event->hw.idx) for all events so that event
count settings will get re-initialised correctly.

Signed-off-by: Athira Rajeev 
[ Fixed compilation error reported by kernel test robot ]
Reported-by: kernel test robot 
---
Change from v1 -> v2:
 - Moved the mobility_pmu_enable and mobility_pmu_disable
   declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
   Also included 'asm/rtas.h' in core-book3s to fix the
   compilation warning reported by kernel test robot.

 arch/powerpc/include/asm/rtas.h   |  8 ++
 arch/powerpc/perf/core-book3s.c   | 44 ---
 arch/powerpc/platforms/pseries/mobility.c |  4 +++
 3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9dc97d2..cea72d7 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -380,5 +380,13 @@ static inline void rtas_initialize(void) { }
 static inline void read_24x7_sys_info(void) { }
 #endif
 
+#ifdef CONFIG_PPC_PERF_CTRS
+void mobility_pmu_disable(void);
+void mobility_pmu_enable(void);
+#else
+static inline void mobility_pmu_disable(void) { }
+static inline void mobility_pmu_enable(void) { }
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _POWERPC_RTAS_H */
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index bb0ee71..90da7fa 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_PPC64
 #include "internal.h"
@@ -58,6 +59,7 @@ struct cpu_hw_events {
 
/* Store the PMC values */
unsigned long pmcs[MAX_HWEVENTS];
+   int migrate;
 };
 
 static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
@@ -1335,6 +1337,22 @@ static void power_pmu_disable(struct pmu *pmu)
 }
 
 /*
+ * Called from powerpc mobility code
+ * before migration to disable counters
+ * if the PMU is active.
+ */
+void mobility_pmu_disable(void)
+{
+   struct cpu_hw_events *cpuhw;
+
+   cpuhw = this_cpu_ptr(&cpu_hw_events);
+   if (cpuhw->n_events != 0) {
+   power_pmu_disable(NULL);
+   cpuhw->migrate = 1;
+   }
+}
+
+/*
  * Re-enable all events if disable == 0.
  * If we were previously disabled and events were added, then
  * put the new config on the PMU.
@@ -1379,8 +1397,10 @@ static void power_pmu_enable(struct pmu *pmu)
 * no need to rec

[PATCH] arch:powerpc:platforms:powernv: cleanup __FUNCTION__ usage

2021-07-11 Thread Dwaipayan Ray
__FUNCTION__ exists only for backwards compatibility reasons
with old gcc versions. Replace it with __func__.

Signed-off-by: Dwaipayan Ray 
---
 arch/powerpc/platforms/powernv/opal-imc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 7824cc364bc4..5bafbf34e820 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -211,7 +211,7 @@ static void disable_core_pmu_counters(void)
get_hard_smp_processor_id(cpu));
if (rc)
pr_err("%s: Failed to stop Core (cpu = %d)\n",
-   __FUNCTION__, cpu);
+   __func__, cpu);
}
put_online_cpus();
 }
-- 
2.28.0



[PATCH] arch:powerpc:mm: cleanup __FUNCTION__ usage

2021-07-11 Thread Dwaipayan Ray
__FUNCTION__ exists only for backwards compatibility reasons
with old gcc versions. Replace it with __func__.

Signed-off-by: Dwaipayan Ray 
---
 arch/powerpc/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f2bf98bdcea2..3d3f570c925c 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1236,7 +1236,7 @@ int find_and_online_cpu_nid(int cpu)
 #endif
}
 
-   pr_debug("%s:%d cpu %d nid %d\n", __FUNCTION__, __LINE__,
+   pr_debug("%s:%d cpu %d nid %d\n", __func__, __LINE__,
cpu, new_nid);
return new_nid;
 }
-- 
2.28.0



Re: [PATCH] powerpc/perf: Enable PMU counters post partition migration if PMU is active

2021-07-11 Thread kernel test robot
Hi Athira,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on v5.13 next-20210709]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Athira-Rajeev/powerpc-perf-Enable-PMU-counters-post-partition-migration-if-PMU-is-active/20210711-150741
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/2050c82afb3abd9eaa57fee45e71e7fccabfb81f
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Athira-Rajeev/powerpc-perf-Enable-PMU-counters-post-partition-migration-if-PMU-is-active/20210711-150741
git checkout 2050c82afb3abd9eaa57fee45e71e7fccabfb81f
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> arch/powerpc/perf/core-book3s.c:1343:6: warning: no previous prototype for 
>> 'mobility_pmu_disable' [-Wmissing-prototypes]
1343 | void mobility_pmu_disable(void)
 |  ^~~~
>> arch/powerpc/perf/core-book3s.c:1537:6: warning: no previous prototype for 
>> 'mobility_pmu_enable' [-Wmissing-prototypes]
1537 | void mobility_pmu_enable(void)
 |  ^~~


vim +/mobility_pmu_disable +1343 arch/powerpc/perf/core-book3s.c

  1337  
  1338  /*
  1339   * Called from powerpc mobility code
  1340   * before migration to disable counters
  1341   * if the PMU is active.
  1342   */
> 1343  void mobility_pmu_disable(void)
  1344  {
  1345  struct cpu_hw_events *cpuhw;
  1346  
  1347  cpuhw = this_cpu_ptr(&cpu_hw_events);
  1348  if (cpuhw->n_events != 0) {
  1349  power_pmu_disable(NULL);
  1350  cpuhw->migrate = 1;
  1351  }
  1352  }
  1353  
  1354  /*
  1355   * Re-enable all events if disable == 0.
  1356   * If we were previously disabled and events were added, then
  1357   * put the new config on the PMU.
  1358   */
  1359  static void power_pmu_enable(struct pmu *pmu)
  1360  {
  1361  struct perf_event *event;
  1362  struct cpu_hw_events *cpuhw;
  1363  unsigned long flags;
  1364  long i;
  1365  unsigned long val, mmcr0;
  1366  s64 left;
  1367  unsigned int hwc_index[MAX_HWEVENTS];
  1368  int n_lim;
  1369  int idx;
  1370  bool ebb;
  1371  
  1372  if (!ppmu)
  1373  return;
  1374  local_irq_save(flags);
  1375  
  1376  cpuhw = this_cpu_ptr(&cpu_hw_events);
  1377  if (!cpuhw->disabled)
  1378  goto out;
  1379  
  1380  if (cpuhw->n_events == 0) {
  1381  ppc_set_pmu_inuse(0);
  1382  goto out;
  1383  }
  1384  
  1385  cpuhw->disabled = 0;
  1386  
  1387  /*
  1388   * EBB requires an exclusive group and all events must have the 
EBB
  1389   * flag set, or not set, so we can just check a single event. 
Also we
  1390   * know we have at least one event.
  1391   */
  1392  ebb = is_ebb_event(cpuhw->event[0]);
  1393  
  1394  /*
  1395   * If we didn't change anything, or only removed events,
  1396   * no need to recalculate MMCR* settings and reset the PMCs.
  1397   * Just reenable the PMU with the current MMCR* settings
  1398   * (possibly updated for removal of events).
  1399   * While reenabling PMU during partition migration, continue
  1400   * with normal flow.
  1401   */
  1402  if (!cpuhw->n_added && !cpuhw->migrate) {
  1403  mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & 
~MMCRA_SAMPLE_ENABLE);
  1404  mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1);
  1405  if (ppmu->flags & PPMU_ARCH_31)
  1406  mtspr(SPRN_MMCR3, cpuhw->mmcr.mmcr3);
  1407  goto out_enable;
  1408  }
  1409  
  1410  /*
  1411   * Clear all MMCR settings and recompute them for the new set 
of events.
  1412   */
  1413  memset(&cpuh

[PATCH] powerpc/perf: Enable PMU counters post partition migration if PMU is active

2021-07-11 Thread Athira Rajeev
During Live Partition Migration (LPM), it is observed that after
migration completion, perf counter values reports 0 incase of
system/cpu wide monitoring. However 'perf stat' with workload
continues to show counts post migration since PMU gets disabled/enabled
during sched switches.

Example:
 ./perf stat -e r1001e -I 1000
   time counts unit events
 1.001010437 22,137,414  r1001e
 2.002495447 15,455,821  r1001e
<<>> As seen in next below logs, the counter values shows zero
after migration is completed.
<<>>
86.142535370129,392,333,440  r1001e
87.144714617  0  r1001e
88.146526636  0  r1001e
89.148085029  0  r1001e

Here PMU is enabled during start of perf session and counter
values are read at intervals. Counters are only disabled at the
end of session. The powerpc mobility code presently does not handle
disabling and enabling back of PMU counters during partition
migration. Also since the PMU register values are not saved/restored
during migration, PMU registers like Monitor Mode Control Register 0
(MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
the value it was programmed with. Hence PMU counters will not be
enabled correctly post migration.

Fix this in mobility code by handling disabling and enabling of
PMU in all cpu's before and after migration. Patch introduces two
functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
mobility_pmu_disable() is called before the processor threads goes
to suspend state so as to disable the PMU counters. And disable is
done only if there are any active events running on that cpu.
mobility_pmu_enable() is called after the processor threads are
back online to enable back the PMU counters.

Since the performance Monitor counters ( PMCs) are not
saved/restored during LPM, results in PMC value being zero and the
'event->hw.prev_count' being non-zero value. This causes problem
during updation of event->count since we always accumulate
(event->hw.prev_count - PMC value) in event->count.  If
event->hw.prev_count is greater PMC value, event->count becomes
negative. Fix this by re-initialising 'prev_count' also for all
events while enabling back the events. A new variable 'migrate' is
introduced in 'struct cpu_hw_event' to achieve this for LPM cases
in power_pmu_enable. Use the 'migrate' value to clear the PMC
index (stored in event->hw.idx) for all events so that event
count settings will get re-initialised correctly.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/include/asm/rtas.h   |  4 +++
 arch/powerpc/perf/core-book3s.c   | 43 ---
 arch/powerpc/platforms/pseries/mobility.c |  4 +++
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9dc97d2..3fc478a 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -376,8 +376,12 @@ static inline void rtas_initialize(void) { }
 
 #ifdef CONFIG_HV_PERF_CTRS
 void read_24x7_sys_info(void);
+void mobility_pmu_disable(void);
+void mobility_pmu_enable(void);
 #else
 static inline void read_24x7_sys_info(void) { }
+static inline void mobility_pmu_disable(void) { }
+static inline void mobility_pmu_enable(void) { }
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index bb0ee71..e86df45 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -58,6 +58,7 @@ struct cpu_hw_events {
 
/* Store the PMC values */
unsigned long pmcs[MAX_HWEVENTS];
+   int migrate;
 };
 
 static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
@@ -1335,6 +1336,22 @@ static void power_pmu_disable(struct pmu *pmu)
 }
 
 /*
+ * Called from powerpc mobility code
+ * before migration to disable counters
+ * if the PMU is active.
+ */
+void mobility_pmu_disable(void)
+{
+   struct cpu_hw_events *cpuhw;
+
+   cpuhw = this_cpu_ptr(&cpu_hw_events);
+   if (cpuhw->n_events != 0) {
+   power_pmu_disable(NULL);
+   cpuhw->migrate = 1;
+   }
+}
+
+/*
  * Re-enable all events if disable == 0.
  * If we were previously disabled and events were added, then
  * put the new config on the PMU.
@@ -1379,8 +1396,10 @@ static void power_pmu_enable(struct pmu *pmu)
 * no need to recalculate MMCR* settings and reset the PMCs.
 * Just reenable the PMU with the current MMCR* settings
 * (possibly updated for removal of events).
+* While reenabling PMU during partition migration, continue
+* with normal flow.
 */
-   if (!cpuhw->n_added) {
+   if (!cpuhw->n_added && !cpuhw->migrate) {
mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1);
if (ppmu->flags & PPMU_ARCH_31