Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-07-19 Thread Huang Rui
On Tue, Jul 19, 2016 at 02:22:36PM -0400, Vince Weaver wrote:
> On Fri, 17 Jun 2016, Huang Rui wrote:
> 
> > On Thu, Jun 16, 2016 at 06:47:00PM +0200, Borislav Petkov wrote:
> > > On Thu, Jun 16, 2016 at 01:38:14PM +0800, Huang Rui wrote:
> > > > I was told this feature would be supported on fam15h 60h, 70h and
> > > > later processors before. Just checked the fam16h model 30h BKDG, yes,
> > > > it should be also supported. But I didn't test that platform, if you
> > > > confirm it works in your side. We can enable it.
> > > 
> > > You might want to ask around first whether F16M30's acc power machinery
> > > is even usable? I.e., no errata and whatnot...
> > > 
> > 
> > Yep, I already asked the designers, and was waiting for their
> > feedbacks.
> 
> Was there ever any feedback about any of the problems encountered with AMD 
> APM support?
> 

Vince, apology to late response. We are drafting the erratum for this
feature. Will let you know if it is public.

Thanks,
Rui


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-07-19 Thread Vince Weaver
On Fri, 17 Jun 2016, Huang Rui wrote:

> On Thu, Jun 16, 2016 at 06:47:00PM +0200, Borislav Petkov wrote:
> > On Thu, Jun 16, 2016 at 01:38:14PM +0800, Huang Rui wrote:
> > > I was told this feature would be supported on fam15h 60h, 70h and
> > > later processors before. Just checked the fam16h model 30h BKDG, yes,
> > > it should be also supported. But I didn't test that platform, if you
> > > confirm it works in your side. We can enable it.
> > 
> > You might want to ask around first whether F16M30's acc power machinery
> > is even usable? I.e., no errata and whatnot...
> > 
> 
> Yep, I already asked the designers, and was waiting for their
> feedbacks.

Was there ever any feedback about any of the problems encountered with AMD 
APM support?

Thanks,

Vince


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-17 Thread Vince Weaver
On Fri, 17 Jun 2016, Huang Rui wrote:

> Can you try to read the MSR value two times with the same core
> (rdmsrl_on_cpu)?

I'm reading from userspace using the /dev/cpu/0/msr device so it should 
always be reading from cpu0.

I guess I could code up a custom kernel module to debug this if necessary.

It does look that for some reason the 0xc0010280 MSR is only returning the 
lower 24 bits of the PTSC, rather than the 40 bits that 
cpuid 8008:ecx seems to think it should have.

Vince


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-17 Thread Huang Rui
On Thu, Jun 16, 2016 at 04:44:20PM -0400, Vince Weaver wrote:
> On Thu, 16 Jun 2016, Huang Rui wrote:
> 
> > > 1.  In theory this should also work on an amd fam16h model 30h
> > > processor too, correct?  The current code limits things to fam15h
> > > even though the fam16mod30h has all the proper cpuid flags.
> > > 
> > 
> > I was told this feature would be supported on fam15h 60h, 70h and
> > later processors before. Just checked the fam16h model 30h BKDG, yes,
> > it should be also supported. But I didn't test that platform, if you
> > confirm it works in your side. We can enable it.
> 
> I can confirm I get power readings on my fam16hmod30h board once I apply a 
> trivial patch to the driver.  I'll send the patch in a separate e-mail.
> 

OK, thanks.

> > PTSC's frequency is about 100Mhz, it shouldn't be overflow.
> 
> That's what I thought.  I'm trying to read the value using the /dev/msr 
> interface from userspace and I get weird results.
> 
> i.e.:
>   Jx: read 62d299b84
>   PTSC MSR: read 72fe92
>   
>   sleep 5ms
> 
>   Jy: read 631b453b9
>   PTSC MSR: read 46b25
> 
> this happens about half the time (PTSC going backwards).  Though 
> admittedly the problem could somehow be in the MSR code I'm using.
> 

Can you try to read the MSR value two times with the same core
(rdmsrl_on_cpu)?

Thanks,
Rui


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-17 Thread Huang Rui
On Thu, Jun 16, 2016 at 06:47:00PM +0200, Borislav Petkov wrote:
> On Thu, Jun 16, 2016 at 01:38:14PM +0800, Huang Rui wrote:
> > I was told this feature would be supported on fam15h 60h, 70h and
> > later processors before. Just checked the fam16h model 30h BKDG, yes,
> > it should be also supported. But I didn't test that platform, if you
> > confirm it works in your side. We can enable it.
> 
> You might want to ask around first whether F16M30's acc power machinery
> is even usable? I.e., no errata and whatnot...
> 

Yep, I already asked the designers, and was waiting for their
feedbacks.

Thanks,
Rui


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-16 Thread Borislav Petkov
On Thu, Jun 16, 2016 at 05:16:04PM -0400, Vince Weaver wrote:
> I'd believe the 6W report as a value for how much the CPU is using.
> The others seem spurious. I guess I should go check the Errata for
> this chip.

Maybe this is the reason why it got enabled on F15 only :-)

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-16 Thread Vince Weaver
On Thu, 16 Jun 2016, Vince Weaver wrote:

One more followup, if I run the benchmark a bunch of times I get this:

  4,472,401.06 mWatts power/power-pkg/  
  
 50,886,303.28 mWatts power/power-pkg/  
  
 81,737,001.44 mWatts power/power-pkg/  
  
  6,525.89 mWatts power/power-pkg/  
  
  6,522.04 mWatts power/power-pkg/  
  
  6,505.68 mWatts power/power-pkg/  
  
  4,938,855.83 mWatts power/power-pkg/  
  
  4,614,620.11 mWatts power/power-pkg/  
  
 79,679,069.41 mWatts power/power-pkg/  
  
152,794,060.83 mWatts power/power-pkg/  
  
  3,942,429.02 mWatts power/power-pkg/  
  
  6,506.73 mWatts power/power-pkg/  
  
 60,198,884.39 mWatts power/power-pkg/  
  

I'd believe the 6W report as a value for how much the CPU is using.  The 
others seem spurious.  I guess I should go check the Errata for this chip.

Vince


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-16 Thread Vince Weaver
On Thu, 16 Jun 2016, Huang Rui wrote:

> On Thu, Jun 16, 2016 at 01:38:13PM +0800, Huang Rui wrote:
> 
> After considering carefully, the original method should be OK. 
> 
>   AMD nomenclature for CMT systems:
> 
> [node 0] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 0
>  -> [Compute Unit Core 1] -> Linux CPU 1
>  -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 2
>  -> [Compute Unit Core 1] -> Linux CPU 3
> 
> The deltaN is power per compute unit. Current one package has two CUs.
> In the *same* interval, CU0's power is 10W, CU1's power is 15W. The
> package (CU0 + CU1) power should be 15W, right? Because the interval
> is the same.
> 
> Q = Q1 + Q2.  P = Q/t = (Q1 + Q2)/t = Q1/t + Q2/t = P1 + P2.
> 
> Is that clear?

OK, I was misunderstanding.  I somehow thought there was a periodic timer 
that was adding accumulating power over time.
But no, the driver just assumes the PTSC does not overflow?  And that 
addition is just there to handle adding all the cores together?

If so, then I agree that the addition makes sense, sorry for confusing 
things.

Although I think it would be better if we reported Joules (like 
RAPL does) rather than average power, but too late to change that now.


Also, on my machine I get results that make no physical sense, such as:

sudo perf stat -a -e power/power-pkg/  /usr/games/primes 1 5 > /dev/null

 Performance counter stats for 'system wide':

  4,472,401.06 mWatts power/power-pkg/  
  

   6.956135769 seconds time elapsed

I somehow don't think the CPU is really burning 4kW of Power.

Vince


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-16 Thread Vince Weaver
On Thu, 16 Jun 2016, Huang Rui wrote:

> > 1.  In theory this should also work on an amd fam16h model 30h
> > processor too, correct?  The current code limits things to fam15h
> > even though the fam16mod30h has all the proper cpuid flags.
> > 
> 
> I was told this feature would be supported on fam15h 60h, 70h and
> later processors before. Just checked the fam16h model 30h BKDG, yes,
> it should be also supported. But I didn't test that platform, if you
> confirm it works in your side. We can enable it.

I can confirm I get power readings on my fam16hmod30h board once I apply a 
trivial patch to the driver.  I'll send the patch in a separate e-mail.

> PTSC's frequency is about 100Mhz, it shouldn't be overflow.

That's what I thought.  I'm trying to read the value using the /dev/msr 
interface from userspace and I get weird results.

i.e.:
Jx: read 62d299b84
PTSC MSR: read 72fe92

sleep 5ms

Jy: read 631b453b9
PTSC MSR: read 46b25

this happens about half the time (PTSC going backwards).  Though 
admittedly the problem could somehow be in the MSR code I'm using.

> mWatts are for processor power not system power. Below data is
> calculated on fam15h model 60h which is low power platform. Even
> though the method has a minor mistake, the processor power should be
> in mWatts field.

I have an actual wall-mounted power meter hooked up to my system and the 
difference from idle to all-cores-busy is 20W, so I would think that that 
the results we find with perf should be >1W at least.

Vince


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-16 Thread Borislav Petkov
On Thu, Jun 16, 2016 at 01:38:14PM +0800, Huang Rui wrote:
> I was told this feature would be supported on fam15h 60h, 70h and
> later processors before. Just checked the fam16h model 30h BKDG, yes,
> it should be also supported. But I didn't test that platform, if you
> confirm it works in your side. We can enable it.

You might want to ask around first whether F16M30's acc power machinery
is even usable? I.e., no errata and whatnot...

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-15 Thread Huang Rui
Hi Vince,

Thanks for asking.

On Wed, Jun 15, 2016 at 09:13:59PM -0400, Vince Weaver wrote:
> 
> three questions about this functionality:
> 
> 1.  In theory this should also work on an amd fam16h model 30h
> processor too, correct?  The current code limits things to fam15h
> even though the fam16mod30h has all the proper cpuid flags.
> 

I was told this feature would be supported on fam15h 60h, 70h and
later processors before. Just checked the fam16h model 30h BKDG, yes,
it should be also supported. But I didn't test that platform, if you
confirm it works in your side. We can enable it.

> I've tested the functionality a bit and it seems to work but for
> some reason the ptsc seems to occasionally count backwards
> on my machine.  Any reason that would be?  (It doesn't seem to be
> an overflow, just reading the ptsc 5ms apart and the values are 
> slightly lower after than before).
> 

PTSC's frequency is about 100Mhz, it shouldn't be overflow.

> 2.  Unless I'm misunderstanding things, the code seems to be accumulating 
>   Power. (see chunk below) Power is an instantaneous measurement, it 
>   makes no sense to add values.  If you use 5W for 1ms and 10W for
>   1ms, the average power across the 2ms interval is not 15W.
> 
>   You can add energy, but not power.
> 
> > +   delta *= cpu_pwr_sample_ratio * 1000;
> > +   tdelta = new_ptsc - prev_ptsc;
> > +
> > +   do_div(delta, tdelta);
> > +   local64_add(delta, &event->count);
> 

You're right. Nice catch! The average power is per compute unit. We
cannot add the power simplely for each processor/package.

So here, the average power per package should be (delta1 + delta2 + ... + 
deltaN)/(tdelta_avg).
I will work out a fix. Thanks to point out.

> 3.  The actual results gathered seem rediculously low.  341 seconds of
> calculation and only using 183 mWatts of power?
> 

mWatts are for processor power not system power. Below data is
calculated on fam15h model 60h which is low power platform. Even
though the method has a minor mistake, the processor power should be
in mWatts field.

> >Performance counter stats for 'system wide':
> > 
> >   183.44 mWatts power/power-pkg/
> > 
> >341.837270111 seconds time elapsed
> > 
> >   root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' 
> > sleep 10
> 

Thanks,
Rui


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-15 Thread Huang Rui
On Thu, Jun 16, 2016 at 01:38:13PM +0800, Huang Rui wrote:
> On Wed, Jun 15, 2016 at 09:13:59PM -0400, Vince Weaver wrote:
> > 
> > 2.  Unless I'm misunderstanding things, the code seems to be accumulating 
> > Power. (see chunk below) Power is an instantaneous measurement, it 
> > makes no sense to add values.  If you use 5W for 1ms and 10W for
> > 1ms, the average power across the 2ms interval is not 15W.
> > 
> > You can add energy, but not power.
> > 
> > > + delta *= cpu_pwr_sample_ratio * 1000;
> > > + tdelta = new_ptsc - prev_ptsc;
> > > +
> > > + do_div(delta, tdelta);
> > > + local64_add(delta, &event->count);
> > 
> 
> You're right. Nice catch! The average power is per compute unit. We
> cannot add the power simplely for each processor/package.
> 
> So here, the average power per package should be (delta1 + delta2 + ... + 
> deltaN)/(tdelta_avg).
> I will work out a fix. Thanks to point out.
> 

After considering carefully, the original method should be OK. 

  AMD nomenclature for CMT systems:

[node 0] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 0
 -> [Compute Unit Core 1] -> Linux CPU 1
 -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 2
 -> [Compute Unit Core 1] -> Linux CPU 3

The deltaN is power per compute unit. Current one package has two CUs.
In the *same* interval, CU0's power is 10W, CU1's power is 15W. The
package (CU0 + CU1) power should be 15W, right? Because the interval
is the same.

Q = Q1 + Q2.  P = Q/t = (Q1 + Q2)/t = Q1/t + Q2/t = P1 + P2.

Is that clear?

Thanks,
Rui


Re: [REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-06-15 Thread Vince Weaver

three questions about this functionality:

1.  In theory this should also work on an amd fam16h model 30h
processor too, correct?  The current code limits things to fam15h
even though the fam16mod30h has all the proper cpuid flags.

I've tested the functionality a bit and it seems to work but for
some reason the ptsc seems to occasionally count backwards
on my machine.  Any reason that would be?  (It doesn't seem to be
an overflow, just reading the ptsc 5ms apart and the values are 
slightly lower after than before).

2.  Unless I'm misunderstanding things, the code seems to be accumulating 
Power. (see chunk below) Power is an instantaneous measurement, it 
makes no sense to add values.  If you use 5W for 1ms and 10W for
1ms, the average power across the 2ms interval is not 15W.

You can add energy, but not power.

> + delta *= cpu_pwr_sample_ratio * 1000;
> + tdelta = new_ptsc - prev_ptsc;
> +
> + do_div(delta, tdelta);
> + local64_add(delta, &event->count);

3.  The actual results gathered seem rediculously low.  341 seconds of
calculation and only using 183 mWatts of power?

>Performance counter stats for 'system wide':
> 
>   183.44 mWatts power/power-pkg/
> 
>341.837270111 seconds time elapsed
> 
>   root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' 
> sleep 10

Vince


[REDO PATCH v7] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-03-08 Thread Huang Rui
Introduce an AMD accumlated power reporting mechanism for the Family
15h, Model 60h processor that can be used to calculate the average
power consumed by a processor during a measurement interval. The
feature support is indicated by CPUID Fn8000_0007_EDX[12].

This feature will be implemented both in hwmon and perf. The current
design provides one event to report per package/processor power
consumption by counting each compute unit power value.

Here the gory details of how the computation is done:

-
* Tsample: compute unit power accumulator sample period
* Tref: the PTSC counter period (PTSC: performance timestamp counter)
* N: the ratio of compute unit power accumulator sample period to the
  PTSC period

* Jmax: max compute unit accumulated power which is indicated by
  MSR_C001007b[MaxCpuSwPwrAcc]

* Jx/Jy: compute unit accumulated power which is indicated by
  MSR_C001007a[CpuSwPwrAcc]

* Tx/Ty: the value of performance timestamp counter which is indicated
  by CU_PTSC MSR_C0010280[PTSC]
* PwrCPUave: CPU average power

i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007.
N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]].

ii. Read the full range of the cumulative energy value from the new
MSR MaxCpuSwPwrAcc.
Jmax = value returned.

iii. At time x, software reads CpuSwPwrAcc and samples the PTSC.
Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC.

iv. At time y, software reads CpuSwPwrAcc and samples the PTSC.
Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC.

v. Calculate the average power consumption for a compute unit over
time period (y-x). Unit of result is uWatt:

if (Jy < Jx) // Rollover has occurred
Jdelta = (Jy + Jmax) - Jx
else
Jdelta = Jy - Jx
PwrCPUave = N * Jdelta * 1000 / (Ty - Tx)
--

Simple example:

  root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' 
make -j4
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
CHK include/generated/timeconst.h
CHK include/generated/bounds.h
CHK include/generated/asm-offsets.h
CALLscripts/checksyscalls.sh
CHK include/generated/compile.h
SKIPPED include/generated/compile.h
Building modules, stage 2.
  Kernel: arch/x86/boot/bzImage is ready  (#40)
MODPOST 4225 modules

   Performance counter stats for 'system wide':

  183.44 mWatts power/power-pkg/

   341.837270111 seconds time elapsed

  root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' 
sleep 10

   Performance counter stats for 'system wide':

0.18 mWatts power/power-pkg/

10.012551815 seconds time elapsed

Suggested-by: Peter Zijlstra 
Suggested-by: Ingo Molnar 
Suggested-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
Signed-off-by: Huang Rui 
Cc: Guenter Roeck 
---

Hi Boris,

I already redo this patch based on tip/master, it depends on some
previous patches you applied before. If you need me to send them
again, please let me know.

Thanks,
Rui

---
 arch/x86/Kconfig|   9 ++
 arch/x86/events/Makefile|   1 +
 arch/x86/events/amd/power.c | 353 
 include/linux/perf_event.h  |   4 +
 4 files changed, 367 insertions(+)
 create mode 100644 arch/x86/events/amd/power.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3e61672..52ef30d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1210,6 +1210,15 @@ config MICROCODE_OLD_INTERFACE
def_bool y
depends on MICROCODE
 
+config PERF_EVENTS_AMD_POWER
+   depends on PERF_EVENTS && CPU_SUP_AMD
+   tristate "AMD Processor Power Reporting Mechanism"
+   ---help---
+ Provide power reporting mechanism support for AMD processors.
+ Currently, it leverages X86_FEATURE_ACC_POWER
+ (CPUID Fn8000_0007_EDX[12]) interface to calculate the
+ average power consumption on Family 15h processors.
+
 config X86_MSR
tristate "/dev/cpu/*/msr - Model-specific register support"
---help---
diff --git a/arch/x86/events/Makefile b/arch/x86/events/Makefile
index fdfea15..f59618a 100644
--- a/arch/x86/events/Makefile
+++ b/arch/x86/events/Makefile
@@ -1,6 +1,7 @@
 obj-y  += core.o
 
 obj-$(CONFIG_CPU_SUP_AMD)   += amd/core.o amd/uncore.o
+obj-$(CONFIG_PERF_EVENTS_AMD_POWER)+= amd/power.o
 obj-$(CONFIG_X86_LOCAL_APIC)+= amd/ibs.o msr.o
 ifdef CONFIG_AMD_IOMMU
 obj-$(CONFIG_CPU_SUP_AMD)   += amd/iommu.o
diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
new file mode 100644
index 000..55a3529
--- /dev/null
+++ b/arch/x86/events/amd/power.c
@@ -0,0 +1,353 @