Re: Add top down metrics to perf stat

2016-05-20 Thread Andi Kleen
> [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a
> nmi_watchdog enabled with topdown. May give wrong results.
> Disable with echo 0 > /proc/sys/kernel/nmi_watchdog
>  1.002097350   retiring bad speculation  
> frontend bound   backend bound
>  1.002097350 S0-C0   2 38.1%0.0%  
>  59.2%2.7%   
>  1.002097350 S0-C1   2 38.1%0.1%  
>  59.7%2.1%   

Ah I see now. this is --metric-only not displaying. --topdown enables
--metric-only implicitely.

I'll send a separate patch for that because metric only was already
merged separately. So it's not really a problem in this patchkit,
but in a previous one.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: Add top down metrics to perf stat

2016-05-20 Thread Andi Kleen
> [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a
> nmi_watchdog enabled with topdown. May give wrong results.
> Disable with echo 0 > /proc/sys/kernel/nmi_watchdog
>  1.002097350   retiring bad speculation  
> frontend bound   backend bound
>  1.002097350 S0-C0   2 38.1%0.0%  
>  59.2%2.7%   
>  1.002097350 S0-C1   2 38.1%0.1%  
>  59.7%2.1%   

Ah I see now. this is --metric-only not displaying. --topdown enables
--metric-only implicitely.

I'll send a separate patch for that because metric only was already
merged separately. So it's not really a problem in this patchkit,
but in a previous one.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: Add top down metrics to perf stat

2016-05-20 Thread Jiri Olsa
On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote:
> Note to reviewers: includes both tools and kernel patches.
> The kernel patches are at the beginning.
> 
> [v2: Address review feedback.
> Metrics are now always printed, but colored when crossing threshold.
> --topdown implies --metric-only.
> Various smaller fixes, see individual patches]
> [v3: Add --single-thread option and support it with HT off.
> Clean up old HT workaround.
> Improve documentation.
> Various smaller fixes, see individual patches.]
> [v4: Rebased on latest tree]
> [v5: Rebased on latest tree. Move debug messages to -vv]
> [v6: Rebased. Remove .aggr-per-core and --single-thread to not
> break old perf binaries. Put SMT enumeration into 
> generic topology API.]
> [v7: Address review comments. Change patch title headers.]

other than the missing headers and unnneeded initialization
of have_frontend_stalled I'm ok with the perf tools part

thanks,
jirka


Re: Add top down metrics to perf stat

2016-05-20 Thread Jiri Olsa
On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote:
> Note to reviewers: includes both tools and kernel patches.
> The kernel patches are at the beginning.
> 
> [v2: Address review feedback.
> Metrics are now always printed, but colored when crossing threshold.
> --topdown implies --metric-only.
> Various smaller fixes, see individual patches]
> [v3: Add --single-thread option and support it with HT off.
> Clean up old HT workaround.
> Improve documentation.
> Various smaller fixes, see individual patches.]
> [v4: Rebased on latest tree]
> [v5: Rebased on latest tree. Move debug messages to -vv]
> [v6: Rebased. Remove .aggr-per-core and --single-thread to not
> break old perf binaries. Put SMT enumeration into 
> generic topology API.]
> [v7: Address review comments. Change patch title headers.]

other than the missing headers and unnneeded initialization
of have_frontend_stalled I'm ok with the perf tools part

thanks,
jirka


Re: Add top down metrics to perf stat

2016-05-20 Thread Jiri Olsa
On Thu, May 19, 2016 at 04:51:30PM -0700, Andi Kleen wrote:
> On Mon, May 16, 2016 at 02:58:38PM +0200, Jiri Olsa wrote:
> > On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote:
> > 
> > SNIP
> > 
> > > 
> > > The formulas to compute the metrics are generic, they
> > > only change based on the availability on the abstracted
> > > input values.
> > > 
> > > The kernel declares the events supported by the current
> > > CPU and perf stat then computes the formulas based on the
> > > available metrics.
> > > 
> > > 
> > > Example output:
> > > 
> > > $ perf stat --topdown -I 1000 cmd
> > >  1.000735655   frontend bound   retiring  
> > >bad speculation  backend bound
> > >  1.000735655 S0-C0   247.84%  11.69%  
> > >  8.37%  32.10%   
> > >  1.000735655 S0-C1   245.53%  11.39%  
> > >  8.52%  34.56%   
> 
> Hi Jiri,
> > 
> > you've lost first 3 header lines (time/core/cpus):
> > 
> > [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 
> > -a
> > #   time core cpus counts unit events
> >  1.000310344 S0-C0   2  3,764,470,414  cycles   
> >
> >  1.000310344 S0-C1   2  3,764,445,293  cycles   
> >
> >  1.000310344 S0-C2   2  3,764,428,422  cycles   
> >
> 
> I can't reproduce that.
> 

I can.. your latest code does not display headers: 'time' 'core' 'cpus'
also the initial '#'

[jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a
nmi_watchdog enabled with topdown. May give wrong results.
Disable with echo 0 > /proc/sys/kernel/nmi_watchdog
 1.002097350   retiring bad speculation  
frontend bound   backend bound
 1.002097350 S0-C0   2 38.1%0.0%   
59.2%2.7%   
 1.002097350 S0-C1   2 38.1%0.1%   
59.7%2.1%   


thanks,
jirka


Re: Add top down metrics to perf stat

2016-05-20 Thread Jiri Olsa
On Thu, May 19, 2016 at 04:51:30PM -0700, Andi Kleen wrote:
> On Mon, May 16, 2016 at 02:58:38PM +0200, Jiri Olsa wrote:
> > On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote:
> > 
> > SNIP
> > 
> > > 
> > > The formulas to compute the metrics are generic, they
> > > only change based on the availability on the abstracted
> > > input values.
> > > 
> > > The kernel declares the events supported by the current
> > > CPU and perf stat then computes the formulas based on the
> > > available metrics.
> > > 
> > > 
> > > Example output:
> > > 
> > > $ perf stat --topdown -I 1000 cmd
> > >  1.000735655   frontend bound   retiring  
> > >bad speculation  backend bound
> > >  1.000735655 S0-C0   247.84%  11.69%  
> > >  8.37%  32.10%   
> > >  1.000735655 S0-C1   245.53%  11.39%  
> > >  8.52%  34.56%   
> 
> Hi Jiri,
> > 
> > you've lost first 3 header lines (time/core/cpus):
> > 
> > [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 
> > -a
> > #   time core cpus counts unit events
> >  1.000310344 S0-C0   2  3,764,470,414  cycles   
> >
> >  1.000310344 S0-C1   2  3,764,445,293  cycles   
> >
> >  1.000310344 S0-C2   2  3,764,428,422  cycles   
> >
> 
> I can't reproduce that.
> 

I can.. your latest code does not display headers: 'time' 'core' 'cpus'
also the initial '#'

[jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a
nmi_watchdog enabled with topdown. May give wrong results.
Disable with echo 0 > /proc/sys/kernel/nmi_watchdog
 1.002097350   retiring bad speculation  
frontend bound   backend bound
 1.002097350 S0-C0   2 38.1%0.0%   
59.2%2.7%   
 1.002097350 S0-C1   2 38.1%0.1%   
59.7%2.1%   


thanks,
jirka


Add top down metrics to perf stat

2016-05-19 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]
[v6: Rebased. Remove .aggr-per-core and --single-thread to not
break old perf binaries. Put SMT enumeration into 
generic topology API.]
[v7: Address review comments. Change patch title headers.]
[v8: Avoid -0.00 output]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)
One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 

Add top down metrics to perf stat

2016-05-19 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]
[v6: Rebased. Remove .aggr-per-core and --single-thread to not
break old perf binaries. Put SMT enumeration into 
generic topology API.]
[v7: Address review comments. Change patch title headers.]
[v8: Avoid -0.00 output]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)
One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 

Re: Add top down metrics to perf stat

2016-05-19 Thread Andi Kleen
On Mon, May 16, 2016 at 02:58:38PM +0200, Jiri Olsa wrote:
> On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote:
> 
> SNIP
> 
> > 
> > The formulas to compute the metrics are generic, they
> > only change based on the availability on the abstracted
> > input values.
> > 
> > The kernel declares the events supported by the current
> > CPU and perf stat then computes the formulas based on the
> > available metrics.
> > 
> > 
> > Example output:
> > 
> > $ perf stat --topdown -I 1000 cmd
> >  1.000735655   frontend bound   retiring
> >  bad speculation  backend bound
> >  1.000735655 S0-C0   247.84%  11.69%
> >8.37%  32.10%   
> >  1.000735655 S0-C1   245.53%  11.39%
> >8.52%  34.56%   

Hi Jiri,
> 
> you've lost first 3 header lines (time/core/cpus):
> 
> [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 -a
> #   time core cpus counts unit events
>  1.000310344 S0-C0   2  3,764,470,414  cycles 
>  
>  1.000310344 S0-C1   2  3,764,445,293  cycles 
>  
>  1.000310344 S0-C2   2  3,764,428,422  cycles 
>  

I can't reproduce that.

The headers look the same as before.

> 
> also I'm still getting -0% as I mentioned in my previous comment:

Keeping the NMI watchdog enabled can make the formulas inaccurate
because the grouping is disabled, and parts of the formulas
may be measured at different times where the execution profile
is different.

But anyways even without that it can be caused by small inaccuracies,
and then during rounding the value rounds to 0.
I can remove the - for this case.

Otherwise the data looks reasonable.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: Add top down metrics to perf stat

2016-05-19 Thread Andi Kleen
On Mon, May 16, 2016 at 02:58:38PM +0200, Jiri Olsa wrote:
> On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote:
> 
> SNIP
> 
> > 
> > The formulas to compute the metrics are generic, they
> > only change based on the availability on the abstracted
> > input values.
> > 
> > The kernel declares the events supported by the current
> > CPU and perf stat then computes the formulas based on the
> > available metrics.
> > 
> > 
> > Example output:
> > 
> > $ perf stat --topdown -I 1000 cmd
> >  1.000735655   frontend bound   retiring
> >  bad speculation  backend bound
> >  1.000735655 S0-C0   247.84%  11.69%
> >8.37%  32.10%   
> >  1.000735655 S0-C1   245.53%  11.39%
> >8.52%  34.56%   

Hi Jiri,
> 
> you've lost first 3 header lines (time/core/cpus):
> 
> [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 -a
> #   time core cpus counts unit events
>  1.000310344 S0-C0   2  3,764,470,414  cycles 
>  
>  1.000310344 S0-C1   2  3,764,445,293  cycles 
>  
>  1.000310344 S0-C2   2  3,764,428,422  cycles 
>  

I can't reproduce that.

The headers look the same as before.

> 
> also I'm still getting -0% as I mentioned in my previous comment:

Keeping the NMI watchdog enabled can make the formulas inaccurate
because the grouping is disabled, and parts of the formulas
may be measured at different times where the execution profile
is different.

But anyways even without that it can be caused by small inaccuracies,
and then during rounding the value rounds to 0.
I can remove the - for this case.

Otherwise the data looks reasonable.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: Add top down metrics to perf stat

2016-05-16 Thread Jiri Olsa
On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote:

SNIP

> 
> The formulas to compute the metrics are generic, they
> only change based on the availability on the abstracted
> input values.
> 
> The kernel declares the events supported by the current
> CPU and perf stat then computes the formulas based on the
> available metrics.
> 
> 
> Example output:
> 
> $ perf stat --topdown -I 1000 cmd
>  1.000735655   frontend bound   retiring 
> bad speculation  backend bound
>  1.000735655 S0-C0   247.84%  11.69%  
>  8.37%  32.10%   
>  1.000735655 S0-C1   245.53%  11.39%  
>  8.52%  34.56%   

you've lost first 3 header lines (time/core/cpus):

[jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 -a
#   time core cpus counts unit events
 1.000310344 S0-C0   2  3,764,470,414  cycles   
   
 1.000310344 S0-C1   2  3,764,445,293  cycles   
   
 1.000310344 S0-C2   2  3,764,428,422  cycles   
   

also I'm still getting -0% as I mentioned in my previous comment:

[jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a
nmi_watchdog enabled with topdown. May give wrong results.
Disable with echo 0 > /proc/sys/kernel/nmi_watchdog
 1.001615409   retiring bad speculation  
frontend bound   backend bound
 1.001615409 S0-C0   2 38.3%0.0%   
58.4%3.3%   
 1.001615409 S0-C1   2 38.1%   -0.0%   
59.3%2.6%   
 1.001615409 S0-C2   2 38.1%0.0%   
58.9%2.9%   
 1.001615409 S0-C3   2 38.1%   -0.0%   
58.9%3.0%   
 1.001615409 S0-C4   2 38.0%0.0%   
59.0%2.9%   
 1.001615409 S0-C5   2 38.1%   -0.0%   
58.6%3.3%   
 1.001615409 S1-C0   2 49.7%1.9%   
44.7%3.7%   

thanks,
jirka


Re: Add top down metrics to perf stat

2016-05-16 Thread Jiri Olsa
On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote:

SNIP

> 
> The formulas to compute the metrics are generic, they
> only change based on the availability on the abstracted
> input values.
> 
> The kernel declares the events supported by the current
> CPU and perf stat then computes the formulas based on the
> available metrics.
> 
> 
> Example output:
> 
> $ perf stat --topdown -I 1000 cmd
>  1.000735655   frontend bound   retiring 
> bad speculation  backend bound
>  1.000735655 S0-C0   247.84%  11.69%  
>  8.37%  32.10%   
>  1.000735655 S0-C1   245.53%  11.39%  
>  8.52%  34.56%   

you've lost first 3 header lines (time/core/cpus):

[jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 -a
#   time core cpus counts unit events
 1.000310344 S0-C0   2  3,764,470,414  cycles   
   
 1.000310344 S0-C1   2  3,764,445,293  cycles   
   
 1.000310344 S0-C2   2  3,764,428,422  cycles   
   

also I'm still getting -0% as I mentioned in my previous comment:

[jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a
nmi_watchdog enabled with topdown. May give wrong results.
Disable with echo 0 > /proc/sys/kernel/nmi_watchdog
 1.001615409   retiring bad speculation  
frontend bound   backend bound
 1.001615409 S0-C0   2 38.3%0.0%   
58.4%3.3%   
 1.001615409 S0-C1   2 38.1%   -0.0%   
59.3%2.6%   
 1.001615409 S0-C2   2 38.1%0.0%   
58.9%2.9%   
 1.001615409 S0-C3   2 38.1%   -0.0%   
58.9%3.0%   
 1.001615409 S0-C4   2 38.0%0.0%   
59.0%2.9%   
 1.001615409 S0-C5   2 38.1%   -0.0%   
58.6%3.3%   
 1.001615409 S1-C0   2 49.7%1.9%   
44.7%3.7%   

thanks,
jirka


Add top down metrics to perf stat

2016-05-13 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]
[v6: Rebased. Remove .aggr-per-core and --single-thread to not
break old perf binaries. Put SMT enumeration into 
generic topology API.]
[v7: Address review comments. Change patch title headers.]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)
One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc 

Add top down metrics to perf stat

2016-05-13 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]
[v6: Rebased. Remove .aggr-per-core and --single-thread to not
break old perf binaries. Put SMT enumeration into 
generic topology API.]
[v7: Address review comments. Change patch title headers.]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)
One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc 

Re: Add top down metrics to perf stat

2016-05-12 Thread Jiri Olsa
On Thu, May 05, 2016 at 04:03:57PM -0700, Andi Kleen wrote:

SNIP

> The kernel declares the events supported by the current
> CPU and perf stat then computes the formulas based on the
> available metrics.
> 
> 
> Example output:
> 
> $ perf stat --topdown -I 1000 cmd
>  1.000735655   frontend bound   retiring 
> bad speculation  backend bound
>  1.000735655 S0-C0   247.84%  11.69%  
>  8.37%  32.10%   
>  1.000735655 S0-C1   245.53%  11.39%  
>  8.52%  34.56%   
>  2.003978563 S0-C0   249.47%  12.22%  
>  8.65%  29.66%   
>  2.003978563 S0-C1   247.21%  12.98%  
>  8.77%  31.04%   
>  3.004901432 S0-C0   249.35%  12.26%  
>  8.68%  29.70%   
>  3.004901432 S0-C1   247.23%  12.67%  
>  8.76%  31.35%   
>  4.005766611 S0-C0   248.44%  12.14%  
>  8.59%  30.82%   
>  4.005766611 S0-C1   246.07%  12.41%  
>  8.67%  32.85%   
>  5.006580592 S0-C0   247.91%  12.08%  
>  8.57%  31.44%   
>  5.006580592 S0-C1   245.57%  12.27%  
>  8.63%  33.53%   
>  6.007545125 S0-C0   247.45%  12.02%  
>  8.57%  31.96%   
>  6.007545125 S0-C1   245.13%  12.17%  
>  8.57%  34.14%   
>  7.008539347 S0-C0   247.07%  12.03%  
>  8.61%  32.29%   


getting -0% for bad speculation.. im on your perf/top-down-20

thanks,
jirka


[root@ibm-x3650m4-01 perf]# ./perf stat --topdown -a -I 1000
nmi_watchdog enabled with topdown. May give wrong results.
Disable with echo 0 > /proc/sys/kernel/nmi_watchdog
 1.002322346   retiring bad speculation  
frontend bound   backend bound
 1.002322346 S0-C0   2 38.3%0.0%   
57.9%3.8%   
 1.002322346 S0-C1   2 38.3%0.0%   
59.1%2.6%   
 1.002322346 S0-C2   2 38.3%0.0%   
59.0%2.6%   
 1.002322346 S0-C3   2 38.3%0.0%   
58.7%3.0%   
 1.002322346 S0-C4   2 38.3%   -0.0%   
58.6%3.1%   
 1.002322346 S0-C5   2 38.4%   -0.0%   
58.3%3.3%   
 1.002322346 S1-C0   2 38.3%   -0.0%   
58.7%3.0%   
 1.002322346 S1-C1   2 38.3%0.0%   
59.7%2.0%   
 1.002322346 S1-C2   2 38.3%   -0.0%   
59.3%2.5%   
 1.002322346 S1-C3   2 38.3%   -0.0%   
59.1%2.5%   
 1.002322346 S1-C4   2 38.3%0.0%   
59.1%2.6%   
 1.002322346 S1-C5   2 38.3%   -0.0%   
59.1%2.7%   
 2.005429451 S0-C0   2 38.3%0.0%   
57.9%3.8%   
 ...


Re: Add top down metrics to perf stat

2016-05-12 Thread Jiri Olsa
On Thu, May 05, 2016 at 04:03:57PM -0700, Andi Kleen wrote:

SNIP

> The kernel declares the events supported by the current
> CPU and perf stat then computes the formulas based on the
> available metrics.
> 
> 
> Example output:
> 
> $ perf stat --topdown -I 1000 cmd
>  1.000735655   frontend bound   retiring 
> bad speculation  backend bound
>  1.000735655 S0-C0   247.84%  11.69%  
>  8.37%  32.10%   
>  1.000735655 S0-C1   245.53%  11.39%  
>  8.52%  34.56%   
>  2.003978563 S0-C0   249.47%  12.22%  
>  8.65%  29.66%   
>  2.003978563 S0-C1   247.21%  12.98%  
>  8.77%  31.04%   
>  3.004901432 S0-C0   249.35%  12.26%  
>  8.68%  29.70%   
>  3.004901432 S0-C1   247.23%  12.67%  
>  8.76%  31.35%   
>  4.005766611 S0-C0   248.44%  12.14%  
>  8.59%  30.82%   
>  4.005766611 S0-C1   246.07%  12.41%  
>  8.67%  32.85%   
>  5.006580592 S0-C0   247.91%  12.08%  
>  8.57%  31.44%   
>  5.006580592 S0-C1   245.57%  12.27%  
>  8.63%  33.53%   
>  6.007545125 S0-C0   247.45%  12.02%  
>  8.57%  31.96%   
>  6.007545125 S0-C1   245.13%  12.17%  
>  8.57%  34.14%   
>  7.008539347 S0-C0   247.07%  12.03%  
>  8.61%  32.29%   


getting -0% for bad speculation.. im on your perf/top-down-20

thanks,
jirka


[root@ibm-x3650m4-01 perf]# ./perf stat --topdown -a -I 1000
nmi_watchdog enabled with topdown. May give wrong results.
Disable with echo 0 > /proc/sys/kernel/nmi_watchdog
 1.002322346   retiring bad speculation  
frontend bound   backend bound
 1.002322346 S0-C0   2 38.3%0.0%   
57.9%3.8%   
 1.002322346 S0-C1   2 38.3%0.0%   
59.1%2.6%   
 1.002322346 S0-C2   2 38.3%0.0%   
59.0%2.6%   
 1.002322346 S0-C3   2 38.3%0.0%   
58.7%3.0%   
 1.002322346 S0-C4   2 38.3%   -0.0%   
58.6%3.1%   
 1.002322346 S0-C5   2 38.4%   -0.0%   
58.3%3.3%   
 1.002322346 S1-C0   2 38.3%   -0.0%   
58.7%3.0%   
 1.002322346 S1-C1   2 38.3%0.0%   
59.7%2.0%   
 1.002322346 S1-C2   2 38.3%   -0.0%   
59.3%2.5%   
 1.002322346 S1-C3   2 38.3%   -0.0%   
59.1%2.5%   
 1.002322346 S1-C4   2 38.3%0.0%   
59.1%2.6%   
 1.002322346 S1-C5   2 38.3%   -0.0%   
59.1%2.7%   
 2.005429451 S0-C0   2 38.3%0.0%   
57.9%3.8%   
 ...


Add top down metrics to perf stat

2016-05-05 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]
[v6: Rebased. Remove .aggr-per-core and --single-thread to not
break old perf binaries. Put SMT enumeration into 
generic topology API.]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)
One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-20



Add top down metrics to perf stat

2016-05-05 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]
[v6: Rebased. Remove .aggr-per-core and --single-thread to not
break old perf binaries. Put SMT enumeration into 
generic topology API.]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)
One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-20



Add top down metrics to perf stat

2016-04-27 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 

Add top down metrics to perf stat

2016-04-27 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 

Add top down metrics to perf stat

2016-04-04 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-17



Add top down metrics to perf stat

2016-04-04 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-17



Re: Add top down metrics to perf stat

2016-03-27 Thread Andi Kleen
> can't see this one (-16):
> 
> [jolsa@krava perf]$ git remote update ak
> Fetching ak
> [jolsa@krava perf]$ git branch -r | grep top-down
>   ak/perf/top-down-10
>   ak/perf/top-down-11
>   ak/perf/top-down-13
>   ak/perf/top-down-2

Please try again, I pushed it again.

-Andi


Re: Add top down metrics to perf stat

2016-03-27 Thread Andi Kleen
> can't see this one (-16):
> 
> [jolsa@krava perf]$ git remote update ak
> Fetching ak
> [jolsa@krava perf]$ git branch -r | grep top-down
>   ak/perf/top-down-10
>   ak/perf/top-down-11
>   ak/perf/top-down-13
>   ak/perf/top-down-2

Please try again, I pushed it again.

-Andi


Re: Add top down metrics to perf stat

2016-03-27 Thread Jiri Olsa
On Tue, Mar 22, 2016 at 04:08:46PM -0700, Andi Kleen wrote:

SNIP

> In this case perf stat automatically enables --per-core mode and also requires
> global mode (-a) and avoiding other filters (no cgroup mode)
> 
> When Hyper Threading is off this can be overriden with the --single-thread
> option. When Hyper Threading is on it is enforced, the only way to
> not require -a here is to off line the logical CPUs of the second
> threads.
> 
> One side effect is that this may require root rights or a
> kernel.perf_event_paranoid=-1 setting. 
> 
> Full tree available in 
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16
> 

can't see this one (-16):

[jolsa@krava perf]$ git remote update ak
Fetching ak
[jolsa@krava perf]$ git branch -r | grep top-down
  ak/perf/top-down-10
  ak/perf/top-down-11
  ak/perf/top-down-13
  ak/perf/top-down-2

thanks,
jirka


Re: Add top down metrics to perf stat

2016-03-27 Thread Jiri Olsa
On Tue, Mar 22, 2016 at 04:08:46PM -0700, Andi Kleen wrote:

SNIP

> In this case perf stat automatically enables --per-core mode and also requires
> global mode (-a) and avoiding other filters (no cgroup mode)
> 
> When Hyper Threading is off this can be overriden with the --single-thread
> option. When Hyper Threading is on it is enforced, the only way to
> not require -a here is to off line the logical CPUs of the second
> threads.
> 
> One side effect is that this may require root rights or a
> kernel.perf_event_paranoid=-1 setting. 
> 
> Full tree available in 
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16
> 

can't see this one (-16):

[jolsa@krava perf]$ git remote update ak
Fetching ak
[jolsa@krava perf]$ git branch -r | grep top-down
  ak/perf/top-down-10
  ak/perf/top-down-11
  ak/perf/top-down-13
  ak/perf/top-down-2

thanks,
jirka


Add top down metrics to perf stat

2016-03-22 Thread Andi Kleen
[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16



Add top down metrics to perf stat

2016-03-22 Thread Andi Kleen
[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
 1.000735655   frontend bound   retiring 
bad speculation  backend bound
 1.000735655 S0-C0   247.84%  11.69%   
8.37%  32.10%   
 1.000735655 S0-C1   245.53%  11.39%   
8.52%  34.56%   
 2.003978563 S0-C0   249.47%  12.22%   
8.65%  29.66%   
 2.003978563 S0-C1   247.21%  12.98%   
8.77%  31.04%   
 3.004901432 S0-C0   249.35%  12.26%   
8.68%  29.70%   
 3.004901432 S0-C1   247.23%  12.67%   
8.76%  31.35%   
 4.005766611 S0-C0   248.44%  12.14%   
8.59%  30.82%   
 4.005766611 S0-C1   246.07%  12.41%   
8.67%  32.85%   
 5.006580592 S0-C0   247.91%  12.08%   
8.57%  31.44%   
 5.006580592 S0-C1   245.57%  12.27%   
8.63%  33.53%   
 6.007545125 S0-C0   247.45%  12.02%   
8.57%  31.96%   
 6.007545125 S0-C1   245.13%  12.17%   
8.57%  34.14%   
 7.008539347 S0-C0   247.07%  12.03%   
8.61%  32.29%   
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16



Re: Add top down metrics to perf stat v2

2015-12-18 Thread Andi Kleen
On Fri, Dec 18, 2015 at 01:31:18AM -0800, Stephane Eranian wrote:
> >> Why force --per-core when HT is on. I know you you need to aggregate
> >> per core, but
> >> you could still display globally. And then if user requests
> >> --per-core, then display per core.
> >
> > Global TopDown doesn't make much sense. Suppose you have two programs
> > running on different cores, one frontend bound and one backend bound.
> > What would the union of the two mean? And you may well end up
> > with sums of ratios which are >100%.
> >
> How could that be if you consider that the machine is N-wide and not just 
> 4-wide
> anymore?
> 
> How what you are describing here is different when HT is off?

I was talking about cores, not CPU threads.

With global aggregation we would aggregate data from different cores,
which is highly dubious for TopDown.

CPU threads on a core are of course aggregated, that is why the patchkit
forces --per-core with HT on.

> If you force --per-core with HT-on, then you need to force it too when
> HT is off so that  you get a similar per core breakdown. In the HT on
> case, each Sx-Cy represents 2 threads, compared to 1 in the non HT
> case.Right now, you have non-HT reporting global, HT reporting per-core.
> That does not make much sense to me.

Ok.  I guess can force --per-core in this case too. This would simplify
things because can get rid of the agg-per-core attribute.

> >> but it would be clearer and simpler to interpret to users.
> >
> > Same problem as above.
> >
> >>
> >> One bug I found when testing is that if you do with HT-on:
> >>
> >> $ perf stat -a --topdown -I 1000 --metric-only sleep 100
> >> Then you get data for frontend and backend but nothing for retiring or
> >> bad speculation.
> >
> > You see all the columns, but no data in some?
> >
> yes, and I don't like that. It is confusing especially when you do not
> know the threshold.
> Why are you suppressing the 'retiring' data when it is at 25% (1/4 of
> the maximum possible)
> when I am running a simple noploop? 25% is a sign of underutilization,
> that could be useful too.

It's what the TopDown specification uses and the paper describes. 

The thresholds are needed when you have more than one level because
the lower levels become meaningless if their parents didn't cross the
threshold. Otherwise you may report something that looks like a
bottle neck, but isn't.

Given there is currently only level 1 in the patchkit, but if we ever
add more levels absolutely need thresholds. So it's better to have
them from Day 1.

Utilization should be reported separately. TopDown cannot give
utilization because it doesn't know about idle time.

I can report - for empty fields if it helps you.  It's not clear
to me why empty fields in CSV are a problem.

I don't think colors are useful here, this would have the problem
described above.

> 
> > That's intended: the percentage is only printed when it crosses a
> > threshold. That's part of the top down specification.
> >
> I don't like that. I would rather see all the percentages.
> My remark applies to non topdown metrics as well, such as IPC.
> Clearly the IPC is awkward to use. You need to know you need to
> measure cycles, instructions to get ipc with --metric-only. Again,

Well it's the default (perf stat --metric-only), or with -d*, and it works fine
with --transaction too.

If you think there should be more predefined sets of metrics
that's fine for me too, but it would be a separate
patch.


-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add top down metrics to perf stat v2

2015-12-18 Thread Stephane Eranian
Andi,

On Thu, Dec 17, 2015 at 5:55 PM, Andi Kleen  wrote:
> Thanks for testing.
>
>
> On Thu, Dec 17, 2015 at 03:31:30PM -0800, Stephane Eranian wrote:
>> I would not add a --topdown option but instead a --metric option with 
>> arguments
>> such that other metrics could be added later:
>>
>>$ perf stat --metrics topdown -I 1000 -a sleep 100
>>
>> If you do this, you do not need the --metric-only option
>
> The --metric-only option is useful with other metrics too. For example
> to get concise (and plottable) IPC or TSX abort statistics. See the
> examples in the original commit.
>
> However could make --topdown default to --metric-only and add
> an option to turn it off. Yes that's probably a better default
> for more people, although some people could be annoyed by the
> wide output.
>
>> The double --topdown is confusing.
>
> Ok. I was thinking of changing it and adding an extra argument for
> the "ignore threshold" behavior. That would also make it more extensible
> if we ever add Level 2.
>
I think you should drop that implicit threshold data suppression
feature altogether.
See what I write below.

>>
>> Why force --per-core when HT is on. I know you you need to aggregate
>> per core, but
>> you could still display globally. And then if user requests
>> --per-core, then display per core.
>
> Global TopDown doesn't make much sense. Suppose you have two programs
> running on different cores, one frontend bound and one backend bound.
> What would the union of the two mean? And you may well end up
> with sums of ratios which are >100%.
>
How could that be if you consider that the machine is N-wide and not just 4-wide
anymore?

How what you are describing here is different when HT is off?
If you force --per-core with HT-on, then you need to force it too when
HT is off so that  you get a similar per core breakdown. In the HT on
case, each Sx-Cy represents 2 threads, compared to 1 in the non HT
case.Right now, you have non-HT reporting global, HT reporting per-core.
That does not make much sense to me.

> The only exception where it's useful is for the single threaded
> case (like the toplev --single-thread) option. However it is something
> ugly and difficult because the user would need to ensure that there is
> nothing active on the sibling thread. So I left it out.
>
>> Same if user specifies --per-socket. I know this requires some more
>> plumbing inside perf
>> but it would be clearer and simpler to interpret to users.
>
> Same problem as above.
>
>>
>> One bug I found when testing is that if you do with HT-on:
>>
>> $ perf stat -a --topdown -I 1000 --metric-only sleep 100
>> Then you get data for frontend and backend but nothing for retiring or
>> bad speculation.
>
> You see all the columns, but no data in some?
>
yes, and I don't like that. It is confusing especially when you do not
know the threshold.
Why are you suppressing the 'retiring' data when it is at 25% (1/4 of
the maximum possible)
when I am running a simple noploop? 25% is a sign of underutilization,
that could be useful too.

Furthermore, it makes it harder to parse, including with the -x option
because some fields may
not be there. I would rather see all the values. In non -x mode, you
could use color to indicate
high/low thresholds (similar to perf report).


> That's intended: the percentage is only printed when it crosses a
> threshold. That's part of the top down specification.
>
I don't like that. I would rather see all the percentages.
My remark applies to non topdown metrics as well, such as IPC.
Clearly the IPC is awkward to use. You need to know you need to
measure cycles, instructions to get ipc with --metric-only. Again,
I would rather see:
$ perf stat --metrics ipc 
$ perf stat --metrics topdown

It is more uniform and users do not have to worry about what events
to use to compute a metric. As an example, here is what you
could do (showing side by side metrics ipc and uops/cycles):

# perf stat -a --metric ipc,upc -I 1000 sleep 100
#==
#   |   ipc|upc
#   |   IPC   %Peak|  UPC
#   |^  ^  |   ^
#==
 1.006038929   0.21   5.16%0.60
 2.012169514   0.21   5.31%0.60
 3.018314389   0.20   5.08%0.59
 4.024430081   0.21   5.26%0.60

>> I suspect it is because you expect --metric-only to be used only when
>> you have the
>> double --topdown. That's why I think this double topdown is confusing. If 
>> you do
>> as I suggest, it will be much simpler.
>
> It works fine with single topdown as far as I can tell.
>
>
> -Andi
> --
> a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add top down metrics to perf stat v2

2015-12-18 Thread Stephane Eranian
Andi,

On Thu, Dec 17, 2015 at 5:55 PM, Andi Kleen  wrote:
> Thanks for testing.
>
>
> On Thu, Dec 17, 2015 at 03:31:30PM -0800, Stephane Eranian wrote:
>> I would not add a --topdown option but instead a --metric option with 
>> arguments
>> such that other metrics could be added later:
>>
>>$ perf stat --metrics topdown -I 1000 -a sleep 100
>>
>> If you do this, you do not need the --metric-only option
>
> The --metric-only option is useful with other metrics too. For example
> to get concise (and plottable) IPC or TSX abort statistics. See the
> examples in the original commit.
>
> However could make --topdown default to --metric-only and add
> an option to turn it off. Yes that's probably a better default
> for more people, although some people could be annoyed by the
> wide output.
>
>> The double --topdown is confusing.
>
> Ok. I was thinking of changing it and adding an extra argument for
> the "ignore threshold" behavior. That would also make it more extensible
> if we ever add Level 2.
>
I think you should drop that implicit threshold data suppression
feature altogether.
See what I write below.

>>
>> Why force --per-core when HT is on. I know you you need to aggregate
>> per core, but
>> you could still display globally. And then if user requests
>> --per-core, then display per core.
>
> Global TopDown doesn't make much sense. Suppose you have two programs
> running on different cores, one frontend bound and one backend bound.
> What would the union of the two mean? And you may well end up
> with sums of ratios which are >100%.
>
How could that be if you consider that the machine is N-wide and not just 4-wide
anymore?

How what you are describing here is different when HT is off?
If you force --per-core with HT-on, then you need to force it too when
HT is off so that  you get a similar per core breakdown. In the HT on
case, each Sx-Cy represents 2 threads, compared to 1 in the non HT
case.Right now, you have non-HT reporting global, HT reporting per-core.
That does not make much sense to me.

> The only exception where it's useful is for the single threaded
> case (like the toplev --single-thread) option. However it is something
> ugly and difficult because the user would need to ensure that there is
> nothing active on the sibling thread. So I left it out.
>
>> Same if user specifies --per-socket. I know this requires some more
>> plumbing inside perf
>> but it would be clearer and simpler to interpret to users.
>
> Same problem as above.
>
>>
>> One bug I found when testing is that if you do with HT-on:
>>
>> $ perf stat -a --topdown -I 1000 --metric-only sleep 100
>> Then you get data for frontend and backend but nothing for retiring or
>> bad speculation.
>
> You see all the columns, but no data in some?
>
yes, and I don't like that. It is confusing especially when you do not
know the threshold.
Why are you suppressing the 'retiring' data when it is at 25% (1/4 of
the maximum possible)
when I am running a simple noploop? 25% is a sign of underutilization,
that could be useful too.

Furthermore, it makes it harder to parse, including with the -x option
because some fields may
not be there. I would rather see all the values. In non -x mode, you
could use color to indicate
high/low thresholds (similar to perf report).


> That's intended: the percentage is only printed when it crosses a
> threshold. That's part of the top down specification.
>
I don't like that. I would rather see all the percentages.
My remark applies to non topdown metrics as well, such as IPC.
Clearly the IPC is awkward to use. You need to know you need to
measure cycles, instructions to get ipc with --metric-only. Again,
I would rather see:
$ perf stat --metrics ipc 
$ perf stat --metrics topdown

It is more uniform and users do not have to worry about what events
to use to compute a metric. As an example, here is what you
could do (showing side by side metrics ipc and uops/cycles):

# perf stat -a --metric ipc,upc -I 1000 sleep 100
#==
#   |   ipc|upc
#   |   IPC   %Peak|  UPC
#   |^  ^  |   ^
#==
 1.006038929   0.21   5.16%0.60
 2.012169514   0.21   5.31%0.60
 3.018314389   0.20   5.08%0.59
 4.024430081   0.21   5.26%0.60

>> I suspect it is because you expect --metric-only to be used only when
>> you have the
>> double --topdown. That's why I think this double topdown is confusing. If 
>> you do
>> as I suggest, it will be much simpler.
>
> It works fine with single topdown as far as I can tell.
>
>
> -Andi
> --
> a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add top down metrics to perf stat v2

2015-12-18 Thread Andi Kleen
On Fri, Dec 18, 2015 at 01:31:18AM -0800, Stephane Eranian wrote:
> >> Why force --per-core when HT is on. I know you you need to aggregate
> >> per core, but
> >> you could still display globally. And then if user requests
> >> --per-core, then display per core.
> >
> > Global TopDown doesn't make much sense. Suppose you have two programs
> > running on different cores, one frontend bound and one backend bound.
> > What would the union of the two mean? And you may well end up
> > with sums of ratios which are >100%.
> >
> How could that be if you consider that the machine is N-wide and not just 
> 4-wide
> anymore?
> 
> How what you are describing here is different when HT is off?

I was talking about cores, not CPU threads.

With global aggregation we would aggregate data from different cores,
which is highly dubious for TopDown.

CPU threads on a core are of course aggregated, that is why the patchkit
forces --per-core with HT on.

> If you force --per-core with HT-on, then you need to force it too when
> HT is off so that  you get a similar per core breakdown. In the HT on
> case, each Sx-Cy represents 2 threads, compared to 1 in the non HT
> case.Right now, you have non-HT reporting global, HT reporting per-core.
> That does not make much sense to me.

Ok.  I guess can force --per-core in this case too. This would simplify
things because can get rid of the agg-per-core attribute.

> >> but it would be clearer and simpler to interpret to users.
> >
> > Same problem as above.
> >
> >>
> >> One bug I found when testing is that if you do with HT-on:
> >>
> >> $ perf stat -a --topdown -I 1000 --metric-only sleep 100
> >> Then you get data for frontend and backend but nothing for retiring or
> >> bad speculation.
> >
> > You see all the columns, but no data in some?
> >
> yes, and I don't like that. It is confusing especially when you do not
> know the threshold.
> Why are you suppressing the 'retiring' data when it is at 25% (1/4 of
> the maximum possible)
> when I am running a simple noploop? 25% is a sign of underutilization,
> that could be useful too.

It's what the TopDown specification uses and the paper describes. 

The thresholds are needed when you have more than one level because
the lower levels become meaningless if their parents didn't cross the
threshold. Otherwise you may report something that looks like a
bottle neck, but isn't.

Given there is currently only level 1 in the patchkit, but if we ever
add more levels absolutely need thresholds. So it's better to have
them from Day 1.

Utilization should be reported separately. TopDown cannot give
utilization because it doesn't know about idle time.

I can report - for empty fields if it helps you.  It's not clear
to me why empty fields in CSV are a problem.

I don't think colors are useful here, this would have the problem
described above.

> 
> > That's intended: the percentage is only printed when it crosses a
> > threshold. That's part of the top down specification.
> >
> I don't like that. I would rather see all the percentages.
> My remark applies to non topdown metrics as well, such as IPC.
> Clearly the IPC is awkward to use. You need to know you need to
> measure cycles, instructions to get ipc with --metric-only. Again,

Well it's the default (perf stat --metric-only), or with -d*, and it works fine
with --transaction too.

If you think there should be more predefined sets of metrics
that's fine for me too, but it would be a separate
patch.


-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add top down metrics to perf stat v2

2015-12-17 Thread Andi Kleen
Thanks for testing.


On Thu, Dec 17, 2015 at 03:31:30PM -0800, Stephane Eranian wrote:
> I would not add a --topdown option but instead a --metric option with 
> arguments
> such that other metrics could be added later:
> 
>$ perf stat --metrics topdown -I 1000 -a sleep 100
> 
> If you do this, you do not need the --metric-only option

The --metric-only option is useful with other metrics too. For example 
to get concise (and plottable) IPC or TSX abort statistics. See the
examples in the original commit.

However could make --topdown default to --metric-only and add
an option to turn it off. Yes that's probably a better default
for more people, although some people could be annoyed by the
wide output.

> The double --topdown is confusing.

Ok. I was thinking of changing it and adding an extra argument for
the "ignore threshold" behavior. That would also make it more extensible
if we ever add Level 2.

> 
> Why force --per-core when HT is on. I know you you need to aggregate
> per core, but
> you could still display globally. And then if user requests
> --per-core, then display per core.

Global TopDown doesn't make much sense. Suppose you have two programs
running on different cores, one frontend bound and one backend bound.
What would the union of the two mean? And you may well end up
with sums of ratios which are >100%.

The only exception where it's useful is for the single threaded
case (like the toplev --single-thread) option. However it is something
ugly and difficult because the user would need to ensure that there is
nothing active on the sibling thread. So I left it out.

> Same if user specifies --per-socket. I know this requires some more
> plumbing inside perf
> but it would be clearer and simpler to interpret to users.

Same problem as above.

> 
> One bug I found when testing is that if you do with HT-on:
> 
> $ perf stat -a --topdown -I 1000 --metric-only sleep 100
> Then you get data for frontend and backend but nothing for retiring or
> bad speculation.

You see all the columns, but no data in some? 

That's intended: the percentage is only printed when it crosses a
threshold. That's part of the top down specificatio.
 
> I suspect it is because you expect --metric-only to be used only when
> you have the
> double --topdown. That's why I think this double topdown is confusing. If you 
> do
> as I suggest, it will be much simpler.

It works fine with single topdown as far as I can tell.


-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add top down metrics to perf stat v2

2015-12-17 Thread Stephane Eranian
On Thu, Dec 17, 2015 at 6:01 AM, Andi Kleen  wrote:
> On Thu, Dec 17, 2015 at 02:27:58AM -0800, Stephane Eranian wrote:
>> > S0-C1   2  4175583320.00  topdown-slots-retired
>> >  (100.00%)
>> > S0-C1   2 1743329246  topdown-recovery-bubbles  #
>> > 22.22% bad speculation (100.00%)
>> > S0-C1   2  6138901193.50  topdown-slots-issued  #
>> > 46.99% backend bound
>> >
>> I don't see how this output could be very useful. What matters is the
>> percentage in the comments
>> and not so much the raw counts because what is the unit? Same remark
>> holds for the percentage.
>> I think you need to explain or show that this is % of issue slots and
>> not cycles.
>
> The events already say slots, not cycles. Except for recovery-bubbles. Could 
> add
> -slots there too if you think it's helpful, although it would make the
> name very long and may not fit into the column anymore.
>
I would drop the default output, it is not useful.

I would not add a --topdown option but instead a --metric option with arguments
such that other metrics could be added later:

   $ perf stat --metrics topdown -I 1000 -a sleep 100

If you do this, you do not need the --metric-only option

The double --topdown is confusing.

Why force --per-core when HT is on. I know you you need to aggregate
per core, but
you could still display globally. And then if user requests
--per-core, then display per core.
Same if user specifies --per-socket. I know this requires some more
plumbing inside perf
but it would be clearer and simpler to interpret to users.

One bug I found when testing is that if you do with HT-on:

$ perf stat -a --topdown -I 1000 --metric-only sleep 100
Then you get data for frontend and backend but nothing for retiring or
bad speculation.

I suspect it is because you expect --metric-only to be used only when
you have the
double --topdown. That's why I think this double topdown is confusing. If you do
as I suggest, it will be much simpler.

>>
>> >1.535832673 seconds time elapsed
>> >
>> > $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s
>>
>> When I tried from your git tree the --metric-only option was not recognized.
>
> See below.
>>
>> >  0.100576098 frontend bound   retiring bad 
>> > speculation  backend bound
>> >  0.100576098 8.83%  48.93%  35.24% 
>> >   7.00%
>> >  0.200800845 8.84%  48.49%  35.53% 
>> >   7.13%
>> >  0.300905983 8.73%  48.64%  35.58% 
>> >   7.05%
>> > ...
>> >
>> This kind of output is more meaningful and clearer for end-users based
>> on my experience
>> and you'd like it per-core possibly.
>
> Yes --metric-only is a lot clearer.
>
> per-core is supported and automatically enabled with SMT on.
>
>> > Full tree available in
>> > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc 
>> > perf/top-down-11
>>
>> That is in the top-down-2 branch instead, I think.
>
> Sorry, typo
>
> The correct branch is perf/top-down-10
>
> I also updated it now with the latest review feedback changes.
>
> top-down-2 is an really old branch that indeed didn't have metric-only.
>
> -Andi
>
> --
> a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add top down metrics to perf stat v2

2015-12-17 Thread Andi Kleen
On Thu, Dec 17, 2015 at 02:27:58AM -0800, Stephane Eranian wrote:
> > S0-C1   2  4175583320.00  topdown-slots-retired 
> > (100.00%)
> > S0-C1   2 1743329246  topdown-recovery-bubbles  #
> > 22.22% bad speculation (100.00%)
> > S0-C1   2  6138901193.50  topdown-slots-issued  #
> > 46.99% backend bound
> >
> I don't see how this output could be very useful. What matters is the
> percentage in the comments
> and not so much the raw counts because what is the unit? Same remark
> holds for the percentage.
> I think you need to explain or show that this is % of issue slots and
> not cycles.

The events already say slots, not cycles. Except for recovery-bubbles. Could add
-slots there too if you think it's helpful, although it would make the
name very long and may not fit into the column anymore.

> 
> >1.535832673 seconds time elapsed
> >
> > $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s
> 
> When I tried from your git tree the --metric-only option was not recognized.

See below.
> 
> >  0.100576098 frontend bound   retiring bad 
> > speculation  backend bound
> >  0.100576098 8.83%  48.93%  35.24%  
> >  7.00%
> >  0.200800845 8.84%  48.49%  35.53%  
> >  7.13%
> >  0.300905983 8.73%  48.64%  35.58%  
> >  7.05%
> > ...
> >
> This kind of output is more meaningful and clearer for end-users based
> on my experience
> and you'd like it per-core possibly.

Yes --metric-only is a lot clearer.

per-core is supported and automatically enabled with SMT on.

> > Full tree available in
> > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-11
> 
> That is in the top-down-2 branch instead, I think.

Sorry, typo

The correct branch is perf/top-down-10

I also updated it now with the latest review feedback changes.

top-down-2 is an really old branch that indeed didn't have metric-only.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add top down metrics to perf stat v2

2015-12-17 Thread Stephane Eranian
Andi,

On Tue, Dec 15, 2015 at 4:54 PM, Andi Kleen  wrote:
> Note to reviewers: includes both tools and kernel patches.
> The kernel patches are at the end.
>
> This patchkit adds support for TopDown measurements to perf stat
> It applies on top of my earlier metrics patchkit, posted
> separately, and the --metric-only patchkit (also
> posted separately)
>
> TopDown is intended to replace the frontend cycles idle/
> backend cycles idle metrics in standard perf stat output.
> These metrics are not reliable in many workloads,
> due to out of order effects.
>
> This implements a new --topdown mode in perf stat
> (similar to --transaction) that measures the pipe line
> bottlenecks using standardized formulas. The measurement
> can be all done with 5 counters (one fixed counter)
>
> The result are four metrics:
> FrontendBound, BackendBound, BadSpeculation, Retiring
>
> that describe the CPU pipeline behavior on a high level.
>
> FrontendBound and BackendBound
> BadSpeculation is a higher
>
> The full top down methology has many hierarchical metrics.
> This implementation only supports level 1 which can be
> collected without multiplexing. A full implementation
> of top down on top of perf is available in pmu-tools toplev.
> (http://github.com/andikleen/pmu-tools)
>
> The current version works on Intel Core CPUs starting
> with Sandy Bridge, and Atom CPUs starting with Silvermont.
> In principle the generic metrics should be also implementable
> on other out of order CPUs.
>
> TopDown level 1 uses a set of abstracted metrics which
> are generic to out of order CPU cores (although some
> CPUs may not implement all of them):
>
> topdown-total-slots   Available slots in the pipeline
> topdown-slots-issued  Slots issued into the pipeline
> topdown-slots-retired Slots successfully retired
> topdown-fetch-bubbles Pipeline gaps in the frontend
> topdown-recovery-bubbles  Pipeline gaps during recovery
>   from misspeculation
>
> These metrics then allow to compute four useful metrics:
> FrontendBound, BackendBound, Retiring, BadSpeculation.
>
> The formulas to compute the metrics are generic, they
> only change based on the availability on the abstracted
> input values.
>
> The kernel declares the events supported by the current
> CPU and perf stat then computes the formulas based on the
> available metrics.
>
>
> Example output:
>
> $ ./perf stat --topdown -a ./BC1s
>
>  Performance counter stats for 'system wide':
>
> S0-C0   2   19650790  topdown-total-slots 
>   (100.00%)
> S0-C0   2 4445680.00  topdown-fetch-bubbles #
> 22.62% frontend bound  (100.00%)
> S0-C0   2 1743552.00  topdown-slots-retired   
>   (100.00%)
> S0-C0   2 622954  topdown-recovery-bubbles
>   (100.00%)
> S0-C0   2 2025498.00  topdown-slots-issued  #
> 63.90% backend bound
> S0-C1   216685216540  topdown-total-slots 
>   (100.00%)
> S0-C1   2   962557931.00  topdown-fetch-bubbles   
>   (100.00%)
> S0-C1   2  4175583320.00  topdown-slots-retired   
>   (100.00%)
> S0-C1   2 1743329246  topdown-recovery-bubbles  #
> 22.22% bad speculation (100.00%)
> S0-C1   2  6138901193.50  topdown-slots-issued  #
> 46.99% backend bound
>
I don't see how this output could be very useful. What matters is the
percentage in the comments
and not so much the raw counts because what is the unit? Same remark
holds for the percentage.
I think you need to explain or show that this is % of issue slots and
not cycles.

>1.535832673 seconds time elapsed
>
> $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s

When I tried from your git tree the --metric-only option was not recognized.

>  0.100576098 frontend bound   retiring bad 
> speculation  backend bound
>  0.100576098 8.83%  48.93%  35.24%
>7.00%
>  0.200800845 8.84%  48.49%  35.53%
>7.13%
>  0.300905983 8.73%  48.64%  35.58%
>7.05%
> ...
>
This kind of output is more meaningful and clearer for end-users based
on my experience
and you'd like it per-core possibly.

>
>
> On Hyper Threaded CPUs Top Down computes metrics per core instead of per 
> logical CPU.
> In this case perf stat automatically enables --per-core mode and also requires
> global mode (-a) and avoiding other filters (no cgroup mode)
>
> One side effect is that this may require root rights or a
> kernel.perf_event_paranoid=-1 

Re: Add top down metrics to perf stat v2

2015-12-17 Thread Stephane Eranian
Andi,

On Tue, Dec 15, 2015 at 4:54 PM, Andi Kleen  wrote:
> Note to reviewers: includes both tools and kernel patches.
> The kernel patches are at the end.
>
> This patchkit adds support for TopDown measurements to perf stat
> It applies on top of my earlier metrics patchkit, posted
> separately, and the --metric-only patchkit (also
> posted separately)
>
> TopDown is intended to replace the frontend cycles idle/
> backend cycles idle metrics in standard perf stat output.
> These metrics are not reliable in many workloads,
> due to out of order effects.
>
> This implements a new --topdown mode in perf stat
> (similar to --transaction) that measures the pipe line
> bottlenecks using standardized formulas. The measurement
> can be all done with 5 counters (one fixed counter)
>
> The result are four metrics:
> FrontendBound, BackendBound, BadSpeculation, Retiring
>
> that describe the CPU pipeline behavior on a high level.
>
> FrontendBound and BackendBound
> BadSpeculation is a higher
>
> The full top down methology has many hierarchical metrics.
> This implementation only supports level 1 which can be
> collected without multiplexing. A full implementation
> of top down on top of perf is available in pmu-tools toplev.
> (http://github.com/andikleen/pmu-tools)
>
> The current version works on Intel Core CPUs starting
> with Sandy Bridge, and Atom CPUs starting with Silvermont.
> In principle the generic metrics should be also implementable
> on other out of order CPUs.
>
> TopDown level 1 uses a set of abstracted metrics which
> are generic to out of order CPU cores (although some
> CPUs may not implement all of them):
>
> topdown-total-slots   Available slots in the pipeline
> topdown-slots-issued  Slots issued into the pipeline
> topdown-slots-retired Slots successfully retired
> topdown-fetch-bubbles Pipeline gaps in the frontend
> topdown-recovery-bubbles  Pipeline gaps during recovery
>   from misspeculation
>
> These metrics then allow to compute four useful metrics:
> FrontendBound, BackendBound, Retiring, BadSpeculation.
>
> The formulas to compute the metrics are generic, they
> only change based on the availability on the abstracted
> input values.
>
> The kernel declares the events supported by the current
> CPU and perf stat then computes the formulas based on the
> available metrics.
>
>
> Example output:
>
> $ ./perf stat --topdown -a ./BC1s
>
>  Performance counter stats for 'system wide':
>
> S0-C0   2   19650790  topdown-total-slots 
>   (100.00%)
> S0-C0   2 4445680.00  topdown-fetch-bubbles #
> 22.62% frontend bound  (100.00%)
> S0-C0   2 1743552.00  topdown-slots-retired   
>   (100.00%)
> S0-C0   2 622954  topdown-recovery-bubbles
>   (100.00%)
> S0-C0   2 2025498.00  topdown-slots-issued  #
> 63.90% backend bound
> S0-C1   216685216540  topdown-total-slots 
>   (100.00%)
> S0-C1   2   962557931.00  topdown-fetch-bubbles   
>   (100.00%)
> S0-C1   2  4175583320.00  topdown-slots-retired   
>   (100.00%)
> S0-C1   2 1743329246  topdown-recovery-bubbles  #
> 22.22% bad speculation (100.00%)
> S0-C1   2  6138901193.50  topdown-slots-issued  #
> 46.99% backend bound
>
I don't see how this output could be very useful. What matters is the
percentage in the comments
and not so much the raw counts because what is the unit? Same remark
holds for the percentage.
I think you need to explain or show that this is % of issue slots and
not cycles.

>1.535832673 seconds time elapsed
>
> $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s

When I tried from your git tree the --metric-only option was not recognized.

>  0.100576098 frontend bound   retiring bad 
> speculation  backend bound
>  0.100576098 8.83%  48.93%  35.24%
>7.00%
>  0.200800845 8.84%  48.49%  35.53%
>7.13%
>  0.300905983 8.73%  48.64%  35.58%
>7.05%
> ...
>
This kind of output is more meaningful and clearer for end-users based
on my experience
and you'd like it per-core possibly.

>
>
> On Hyper Threaded CPUs Top Down computes metrics per core instead of per 
> logical CPU.
> In this case perf stat automatically enables --per-core mode and also requires
> global mode (-a) and avoiding other filters (no cgroup mode)
>
> One side effect is that this may require root rights or a
> 

Re: Add top down metrics to perf stat v2

2015-12-17 Thread Andi Kleen
On Thu, Dec 17, 2015 at 02:27:58AM -0800, Stephane Eranian wrote:
> > S0-C1   2  4175583320.00  topdown-slots-retired 
> > (100.00%)
> > S0-C1   2 1743329246  topdown-recovery-bubbles  #
> > 22.22% bad speculation (100.00%)
> > S0-C1   2  6138901193.50  topdown-slots-issued  #
> > 46.99% backend bound
> >
> I don't see how this output could be very useful. What matters is the
> percentage in the comments
> and not so much the raw counts because what is the unit? Same remark
> holds for the percentage.
> I think you need to explain or show that this is % of issue slots and
> not cycles.

The events already say slots, not cycles. Except for recovery-bubbles. Could add
-slots there too if you think it's helpful, although it would make the
name very long and may not fit into the column anymore.

> 
> >1.535832673 seconds time elapsed
> >
> > $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s
> 
> When I tried from your git tree the --metric-only option was not recognized.

See below.
> 
> >  0.100576098 frontend bound   retiring bad 
> > speculation  backend bound
> >  0.100576098 8.83%  48.93%  35.24%  
> >  7.00%
> >  0.200800845 8.84%  48.49%  35.53%  
> >  7.13%
> >  0.300905983 8.73%  48.64%  35.58%  
> >  7.05%
> > ...
> >
> This kind of output is more meaningful and clearer for end-users based
> on my experience
> and you'd like it per-core possibly.

Yes --metric-only is a lot clearer.

per-core is supported and automatically enabled with SMT on.

> > Full tree available in
> > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-11
> 
> That is in the top-down-2 branch instead, I think.

Sorry, typo

The correct branch is perf/top-down-10

I also updated it now with the latest review feedback changes.

top-down-2 is an really old branch that indeed didn't have metric-only.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add top down metrics to perf stat v2

2015-12-17 Thread Stephane Eranian
On Thu, Dec 17, 2015 at 6:01 AM, Andi Kleen  wrote:
> On Thu, Dec 17, 2015 at 02:27:58AM -0800, Stephane Eranian wrote:
>> > S0-C1   2  4175583320.00  topdown-slots-retired
>> >  (100.00%)
>> > S0-C1   2 1743329246  topdown-recovery-bubbles  #
>> > 22.22% bad speculation (100.00%)
>> > S0-C1   2  6138901193.50  topdown-slots-issued  #
>> > 46.99% backend bound
>> >
>> I don't see how this output could be very useful. What matters is the
>> percentage in the comments
>> and not so much the raw counts because what is the unit? Same remark
>> holds for the percentage.
>> I think you need to explain or show that this is % of issue slots and
>> not cycles.
>
> The events already say slots, not cycles. Except for recovery-bubbles. Could 
> add
> -slots there too if you think it's helpful, although it would make the
> name very long and may not fit into the column anymore.
>
I would drop the default output, it is not useful.

I would not add a --topdown option but instead a --metric option with arguments
such that other metrics could be added later:

   $ perf stat --metrics topdown -I 1000 -a sleep 100

If you do this, you do not need the --metric-only option

The double --topdown is confusing.

Why force --per-core when HT is on. I know you you need to aggregate
per core, but
you could still display globally. And then if user requests
--per-core, then display per core.
Same if user specifies --per-socket. I know this requires some more
plumbing inside perf
but it would be clearer and simpler to interpret to users.

One bug I found when testing is that if you do with HT-on:

$ perf stat -a --topdown -I 1000 --metric-only sleep 100
Then you get data for frontend and backend but nothing for retiring or
bad speculation.

I suspect it is because you expect --metric-only to be used only when
you have the
double --topdown. That's why I think this double topdown is confusing. If you do
as I suggest, it will be much simpler.

>>
>> >1.535832673 seconds time elapsed
>> >
>> > $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s
>>
>> When I tried from your git tree the --metric-only option was not recognized.
>
> See below.
>>
>> >  0.100576098 frontend bound   retiring bad 
>> > speculation  backend bound
>> >  0.100576098 8.83%  48.93%  35.24% 
>> >   7.00%
>> >  0.200800845 8.84%  48.49%  35.53% 
>> >   7.13%
>> >  0.300905983 8.73%  48.64%  35.58% 
>> >   7.05%
>> > ...
>> >
>> This kind of output is more meaningful and clearer for end-users based
>> on my experience
>> and you'd like it per-core possibly.
>
> Yes --metric-only is a lot clearer.
>
> per-core is supported and automatically enabled with SMT on.
>
>> > Full tree available in
>> > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc 
>> > perf/top-down-11
>>
>> That is in the top-down-2 branch instead, I think.
>
> Sorry, typo
>
> The correct branch is perf/top-down-10
>
> I also updated it now with the latest review feedback changes.
>
> top-down-2 is an really old branch that indeed didn't have metric-only.
>
> -Andi
>
> --
> a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add top down metrics to perf stat v2

2015-12-17 Thread Andi Kleen
Thanks for testing.


On Thu, Dec 17, 2015 at 03:31:30PM -0800, Stephane Eranian wrote:
> I would not add a --topdown option but instead a --metric option with 
> arguments
> such that other metrics could be added later:
> 
>$ perf stat --metrics topdown -I 1000 -a sleep 100
> 
> If you do this, you do not need the --metric-only option

The --metric-only option is useful with other metrics too. For example 
to get concise (and plottable) IPC or TSX abort statistics. See the
examples in the original commit.

However could make --topdown default to --metric-only and add
an option to turn it off. Yes that's probably a better default
for more people, although some people could be annoyed by the
wide output.

> The double --topdown is confusing.

Ok. I was thinking of changing it and adding an extra argument for
the "ignore threshold" behavior. That would also make it more extensible
if we ever add Level 2.

> 
> Why force --per-core when HT is on. I know you you need to aggregate
> per core, but
> you could still display globally. And then if user requests
> --per-core, then display per core.

Global TopDown doesn't make much sense. Suppose you have two programs
running on different cores, one frontend bound and one backend bound.
What would the union of the two mean? And you may well end up
with sums of ratios which are >100%.

The only exception where it's useful is for the single threaded
case (like the toplev --single-thread) option. However it is something
ugly and difficult because the user would need to ensure that there is
nothing active on the sibling thread. So I left it out.

> Same if user specifies --per-socket. I know this requires some more
> plumbing inside perf
> but it would be clearer and simpler to interpret to users.

Same problem as above.

> 
> One bug I found when testing is that if you do with HT-on:
> 
> $ perf stat -a --topdown -I 1000 --metric-only sleep 100
> Then you get data for frontend and backend but nothing for retiring or
> bad speculation.

You see all the columns, but no data in some? 

That's intended: the percentage is only printed when it crosses a
threshold. That's part of the top down specificatio.
 
> I suspect it is because you expect --metric-only to be used only when
> you have the
> double --topdown. That's why I think this double topdown is confusing. If you 
> do
> as I suggest, it will be much simpler.

It works fine with single topdown as far as I can tell.


-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Add top down metrics to perf stat v2

2015-12-15 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the end.

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately, and the --metric-only patchkit (also
posted separately)

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ ./perf stat --topdown -a ./BC1s 

 Performance counter stats for 'system wide':

S0-C0   2   19650790  topdown-total-slots   
(100.00%)
S0-C0   2 4445680.00  topdown-fetch-bubbles #22.62% 
frontend bound  (100.00%)
S0-C0   2 1743552.00  topdown-slots-retired 
(100.00%)
S0-C0   2 622954  topdown-recovery-bubbles  
(100.00%)
S0-C0   2 2025498.00  topdown-slots-issued  #63.90% 
backend bound 
S0-C1   216685216540  topdown-total-slots   
(100.00%)
S0-C1   2   962557931.00  topdown-fetch-bubbles 
(100.00%)
S0-C1   2  4175583320.00  topdown-slots-retired 
(100.00%)
S0-C1   2 1743329246  topdown-recovery-bubbles  #22.22% 
bad speculation (100.00%)
S0-C1   2  6138901193.50  topdown-slots-issued  #46.99% 
backend bound 

   1.535832673 seconds time elapsed

$ perf stat --topdown --topdown --metric-only -I 100 ./BC1s
 0.100576098 frontend bound   retiring bad 
speculation  backend bound
 0.100576098 8.83%  48.93%  35.24%  
 7.00%   
 0.200800845 8.84%  48.49%  35.53%  
 7.13%   
 0.300905983 8.73%  48.64%  35.58%  
 7.05%
...


 
On Hyper Threaded CPUs Top Down computes metrics per core instead of per 
logical CPU.
In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting.  

On systems without Hyper Threading it can be used per process.

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-11

No changelog against previous version. There were lots of changes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Add top down metrics to perf stat v2

2015-12-15 Thread Andi Kleen
Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the end.

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately, and the --metric-only patchkit (also
posted separately)

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ ./perf stat --topdown -a ./BC1s 

 Performance counter stats for 'system wide':

S0-C0   2   19650790  topdown-total-slots   
(100.00%)
S0-C0   2 4445680.00  topdown-fetch-bubbles #22.62% 
frontend bound  (100.00%)
S0-C0   2 1743552.00  topdown-slots-retired 
(100.00%)
S0-C0   2 622954  topdown-recovery-bubbles  
(100.00%)
S0-C0   2 2025498.00  topdown-slots-issued  #63.90% 
backend bound 
S0-C1   216685216540  topdown-total-slots   
(100.00%)
S0-C1   2   962557931.00  topdown-fetch-bubbles 
(100.00%)
S0-C1   2  4175583320.00  topdown-slots-retired 
(100.00%)
S0-C1   2 1743329246  topdown-recovery-bubbles  #22.22% 
bad speculation (100.00%)
S0-C1   2  6138901193.50  topdown-slots-issued  #46.99% 
backend bound 

   1.535832673 seconds time elapsed

$ perf stat --topdown --topdown --metric-only -I 100 ./BC1s
 0.100576098 frontend bound   retiring bad 
speculation  backend bound
 0.100576098 8.83%  48.93%  35.24%  
 7.00%   
 0.200800845 8.84%  48.49%  35.53%  
 7.13%   
 0.300905983 8.73%  48.64%  35.58%  
 7.05%
...


 
On Hyper Threaded CPUs Top Down computes metrics per core instead of per 
logical CPU.
In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting.  

On systems without Hyper Threading it can be used per process.

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-11

No changelog against previous version. There were lots of changes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Add top down metrics to perf stat

2015-08-07 Thread Andi Kleen
This patchkit adds support for TopDown to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ ./perf stat --topdown -a ./BC1s 

 Performance counter stats for 'system wide':

S0-C0   2   19650790  topdown-total-slots   
(100.00%)
S0-C0   2 4445680.00  topdown-fetch-bubbles #22.62% 
frontend bound  (100.00%)
S0-C0   2 1743552.00  topdown-slots-retired 
(100.00%)
S0-C0   2 622954  topdown-recovery-bubbles  
(100.00%)
S0-C0   2 2025498.00  topdown-slots-issued  #63.90% 
backend bound 
S0-C1   216685216540  topdown-total-slots   
(100.00%)
S0-C1   2   962557931.00  topdown-fetch-bubbles 
(100.00%)
S0-C1   2  4175583320.00  topdown-slots-retired 
(100.00%)
S0-C1   2 1743329246  topdown-recovery-bubbles  #22.22% 
bad speculation (100.00%)
S0-C1   2  6138901193.50  topdown-slots-issued  #46.99% 
backend bound 

   1.535832673 seconds time elapsed
 
On Hyper Threaded CPUs Top Down computes metrics per core instead of per 
logical CPU.
In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting.  

On systems without Hyper Threading it can be used per process.

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Add top down metrics to perf stat

2015-08-07 Thread Andi Kleen
This patchkit adds support for TopDown to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ ./perf stat --topdown -a ./BC1s 

 Performance counter stats for 'system wide':

S0-C0   2   19650790  topdown-total-slots   
(100.00%)
S0-C0   2 4445680.00  topdown-fetch-bubbles #22.62% 
frontend bound  (100.00%)
S0-C0   2 1743552.00  topdown-slots-retired 
(100.00%)
S0-C0   2 622954  topdown-recovery-bubbles  
(100.00%)
S0-C0   2 2025498.00  topdown-slots-issued  #63.90% 
backend bound 
S0-C1   216685216540  topdown-total-slots   
(100.00%)
S0-C1   2   962557931.00  topdown-fetch-bubbles 
(100.00%)
S0-C1   2  4175583320.00  topdown-slots-retired 
(100.00%)
S0-C1   2 1743329246  topdown-recovery-bubbles  #22.22% 
bad speculation (100.00%)
S0-C1   2  6138901193.50  topdown-slots-issued  #46.99% 
backend bound 

   1.535832673 seconds time elapsed
 
On Hyper Threaded CPUs Top Down computes metrics per core instead of per 
logical CPU.
In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting.  

On systems without Hyper Threading it can be used per process.

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/