Re: Add top down metrics to perf stat
> [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a > nmi_watchdog enabled with topdown. May give wrong results. > Disable with echo 0 > /proc/sys/kernel/nmi_watchdog > 1.002097350 retiring bad speculation > frontend bound backend bound > 1.002097350 S0-C0 2 38.1%0.0% > 59.2%2.7% > 1.002097350 S0-C1 2 38.1%0.1% > 59.7%2.1% Ah I see now. this is --metric-only not displaying. --topdown enables --metric-only implicitely. I'll send a separate patch for that because metric only was already merged separately. So it's not really a problem in this patchkit, but in a previous one. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: Add top down metrics to perf stat
> [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a > nmi_watchdog enabled with topdown. May give wrong results. > Disable with echo 0 > /proc/sys/kernel/nmi_watchdog > 1.002097350 retiring bad speculation > frontend bound backend bound > 1.002097350 S0-C0 2 38.1%0.0% > 59.2%2.7% > 1.002097350 S0-C1 2 38.1%0.1% > 59.7%2.1% Ah I see now. this is --metric-only not displaying. --topdown enables --metric-only implicitely. I'll send a separate patch for that because metric only was already merged separately. So it's not really a problem in this patchkit, but in a previous one. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: Add top down metrics to perf stat
On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote: > Note to reviewers: includes both tools and kernel patches. > The kernel patches are at the beginning. > > [v2: Address review feedback. > Metrics are now always printed, but colored when crossing threshold. > --topdown implies --metric-only. > Various smaller fixes, see individual patches] > [v3: Add --single-thread option and support it with HT off. > Clean up old HT workaround. > Improve documentation. > Various smaller fixes, see individual patches.] > [v4: Rebased on latest tree] > [v5: Rebased on latest tree. Move debug messages to -vv] > [v6: Rebased. Remove .aggr-per-core and --single-thread to not > break old perf binaries. Put SMT enumeration into > generic topology API.] > [v7: Address review comments. Change patch title headers.] other than the missing headers and unnneeded initialization of have_frontend_stalled I'm ok with the perf tools part thanks, jirka
Re: Add top down metrics to perf stat
On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote: > Note to reviewers: includes both tools and kernel patches. > The kernel patches are at the beginning. > > [v2: Address review feedback. > Metrics are now always printed, but colored when crossing threshold. > --topdown implies --metric-only. > Various smaller fixes, see individual patches] > [v3: Add --single-thread option and support it with HT off. > Clean up old HT workaround. > Improve documentation. > Various smaller fixes, see individual patches.] > [v4: Rebased on latest tree] > [v5: Rebased on latest tree. Move debug messages to -vv] > [v6: Rebased. Remove .aggr-per-core and --single-thread to not > break old perf binaries. Put SMT enumeration into > generic topology API.] > [v7: Address review comments. Change patch title headers.] other than the missing headers and unnneeded initialization of have_frontend_stalled I'm ok with the perf tools part thanks, jirka
Re: Add top down metrics to perf stat
On Thu, May 19, 2016 at 04:51:30PM -0700, Andi Kleen wrote: > On Mon, May 16, 2016 at 02:58:38PM +0200, Jiri Olsa wrote: > > On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote: > > > > SNIP > > > > > > > > The formulas to compute the metrics are generic, they > > > only change based on the availability on the abstracted > > > input values. > > > > > > The kernel declares the events supported by the current > > > CPU and perf stat then computes the formulas based on the > > > available metrics. > > > > > > > > > Example output: > > > > > > $ perf stat --topdown -I 1000 cmd > > > 1.000735655 frontend bound retiring > > >bad speculation backend bound > > > 1.000735655 S0-C0 247.84% 11.69% > > > 8.37% 32.10% > > > 1.000735655 S0-C1 245.53% 11.39% > > > 8.52% 34.56% > > Hi Jiri, > > > > you've lost first 3 header lines (time/core/cpus): > > > > [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 > > -a > > # time core cpus counts unit events > > 1.000310344 S0-C0 2 3,764,470,414 cycles > > > > 1.000310344 S0-C1 2 3,764,445,293 cycles > > > > 1.000310344 S0-C2 2 3,764,428,422 cycles > > > > I can't reproduce that. > I can.. your latest code does not display headers: 'time' 'core' 'cpus' also the initial '#' [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a nmi_watchdog enabled with topdown. May give wrong results. Disable with echo 0 > /proc/sys/kernel/nmi_watchdog 1.002097350 retiring bad speculation frontend bound backend bound 1.002097350 S0-C0 2 38.1%0.0% 59.2%2.7% 1.002097350 S0-C1 2 38.1%0.1% 59.7%2.1% thanks, jirka
Re: Add top down metrics to perf stat
On Thu, May 19, 2016 at 04:51:30PM -0700, Andi Kleen wrote: > On Mon, May 16, 2016 at 02:58:38PM +0200, Jiri Olsa wrote: > > On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote: > > > > SNIP > > > > > > > > The formulas to compute the metrics are generic, they > > > only change based on the availability on the abstracted > > > input values. > > > > > > The kernel declares the events supported by the current > > > CPU and perf stat then computes the formulas based on the > > > available metrics. > > > > > > > > > Example output: > > > > > > $ perf stat --topdown -I 1000 cmd > > > 1.000735655 frontend bound retiring > > >bad speculation backend bound > > > 1.000735655 S0-C0 247.84% 11.69% > > > 8.37% 32.10% > > > 1.000735655 S0-C1 245.53% 11.39% > > > 8.52% 34.56% > > Hi Jiri, > > > > you've lost first 3 header lines (time/core/cpus): > > > > [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 > > -a > > # time core cpus counts unit events > > 1.000310344 S0-C0 2 3,764,470,414 cycles > > > > 1.000310344 S0-C1 2 3,764,445,293 cycles > > > > 1.000310344 S0-C2 2 3,764,428,422 cycles > > > > I can't reproduce that. > I can.. your latest code does not display headers: 'time' 'core' 'cpus' also the initial '#' [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a nmi_watchdog enabled with topdown. May give wrong results. Disable with echo 0 > /proc/sys/kernel/nmi_watchdog 1.002097350 retiring bad speculation frontend bound backend bound 1.002097350 S0-C0 2 38.1%0.0% 59.2%2.7% 1.002097350 S0-C1 2 38.1%0.1% 59.7%2.1% thanks, jirka
Add top down metrics to perf stat
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] [v5: Rebased on latest tree. Move debug messages to -vv] [v6: Rebased. Remove .aggr-per-core and --single-thread to not break old perf binaries. Put SMT enumeration into generic topology API.] [v7: Address review comments. Change patch title headers.] [v8: Avoid -0.00 output] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in
Add top down metrics to perf stat
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] [v5: Rebased on latest tree. Move debug messages to -vv] [v6: Rebased. Remove .aggr-per-core and --single-thread to not break old perf binaries. Put SMT enumeration into generic topology API.] [v7: Address review comments. Change patch title headers.] [v8: Avoid -0.00 output] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in
Re: Add top down metrics to perf stat
On Mon, May 16, 2016 at 02:58:38PM +0200, Jiri Olsa wrote: > On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote: > > SNIP > > > > > The formulas to compute the metrics are generic, they > > only change based on the availability on the abstracted > > input values. > > > > The kernel declares the events supported by the current > > CPU and perf stat then computes the formulas based on the > > available metrics. > > > > > > Example output: > > > > $ perf stat --topdown -I 1000 cmd > > 1.000735655 frontend bound retiring > > bad speculation backend bound > > 1.000735655 S0-C0 247.84% 11.69% > >8.37% 32.10% > > 1.000735655 S0-C1 245.53% 11.39% > >8.52% 34.56% Hi Jiri, > > you've lost first 3 header lines (time/core/cpus): > > [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 -a > # time core cpus counts unit events > 1.000310344 S0-C0 2 3,764,470,414 cycles > > 1.000310344 S0-C1 2 3,764,445,293 cycles > > 1.000310344 S0-C2 2 3,764,428,422 cycles > I can't reproduce that. The headers look the same as before. > > also I'm still getting -0% as I mentioned in my previous comment: Keeping the NMI watchdog enabled can make the formulas inaccurate because the grouping is disabled, and parts of the formulas may be measured at different times where the execution profile is different. But anyways even without that it can be caused by small inaccuracies, and then during rounding the value rounds to 0. I can remove the - for this case. Otherwise the data looks reasonable. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: Add top down metrics to perf stat
On Mon, May 16, 2016 at 02:58:38PM +0200, Jiri Olsa wrote: > On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote: > > SNIP > > > > > The formulas to compute the metrics are generic, they > > only change based on the availability on the abstracted > > input values. > > > > The kernel declares the events supported by the current > > CPU and perf stat then computes the formulas based on the > > available metrics. > > > > > > Example output: > > > > $ perf stat --topdown -I 1000 cmd > > 1.000735655 frontend bound retiring > > bad speculation backend bound > > 1.000735655 S0-C0 247.84% 11.69% > >8.37% 32.10% > > 1.000735655 S0-C1 245.53% 11.39% > >8.52% 34.56% Hi Jiri, > > you've lost first 3 header lines (time/core/cpus): > > [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 -a > # time core cpus counts unit events > 1.000310344 S0-C0 2 3,764,470,414 cycles > > 1.000310344 S0-C1 2 3,764,445,293 cycles > > 1.000310344 S0-C2 2 3,764,428,422 cycles > I can't reproduce that. The headers look the same as before. > > also I'm still getting -0% as I mentioned in my previous comment: Keeping the NMI watchdog enabled can make the formulas inaccurate because the grouping is disabled, and parts of the formulas may be measured at different times where the execution profile is different. But anyways even without that it can be caused by small inaccuracies, and then during rounding the value rounds to 0. I can remove the - for this case. Otherwise the data looks reasonable. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: Add top down metrics to perf stat
On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote: SNIP > > The formulas to compute the metrics are generic, they > only change based on the availability on the abstracted > input values. > > The kernel declares the events supported by the current > CPU and perf stat then computes the formulas based on the > available metrics. > > > Example output: > > $ perf stat --topdown -I 1000 cmd > 1.000735655 frontend bound retiring > bad speculation backend bound > 1.000735655 S0-C0 247.84% 11.69% > 8.37% 32.10% > 1.000735655 S0-C1 245.53% 11.39% > 8.52% 34.56% you've lost first 3 header lines (time/core/cpus): [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 -a # time core cpus counts unit events 1.000310344 S0-C0 2 3,764,470,414 cycles 1.000310344 S0-C1 2 3,764,445,293 cycles 1.000310344 S0-C2 2 3,764,428,422 cycles also I'm still getting -0% as I mentioned in my previous comment: [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a nmi_watchdog enabled with topdown. May give wrong results. Disable with echo 0 > /proc/sys/kernel/nmi_watchdog 1.001615409 retiring bad speculation frontend bound backend bound 1.001615409 S0-C0 2 38.3%0.0% 58.4%3.3% 1.001615409 S0-C1 2 38.1% -0.0% 59.3%2.6% 1.001615409 S0-C2 2 38.1%0.0% 58.9%2.9% 1.001615409 S0-C3 2 38.1% -0.0% 58.9%3.0% 1.001615409 S0-C4 2 38.0%0.0% 59.0%2.9% 1.001615409 S0-C5 2 38.1% -0.0% 58.6%3.3% 1.001615409 S1-C0 2 49.7%1.9% 44.7%3.7% thanks, jirka
Re: Add top down metrics to perf stat
On Fri, May 13, 2016 at 06:44:49PM -0700, Andi Kleen wrote: SNIP > > The formulas to compute the metrics are generic, they > only change based on the availability on the abstracted > input values. > > The kernel declares the events supported by the current > CPU and perf stat then computes the formulas based on the > available metrics. > > > Example output: > > $ perf stat --topdown -I 1000 cmd > 1.000735655 frontend bound retiring > bad speculation backend bound > 1.000735655 S0-C0 247.84% 11.69% > 8.37% 32.10% > 1.000735655 S0-C1 245.53% 11.39% > 8.52% 34.56% you've lost first 3 header lines (time/core/cpus): [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --per-core -e cycles -I 1000 -a # time core cpus counts unit events 1.000310344 S0-C0 2 3,764,470,414 cycles 1.000310344 S0-C1 2 3,764,445,293 cycles 1.000310344 S0-C2 2 3,764,428,422 cycles also I'm still getting -0% as I mentioned in my previous comment: [jolsa@ibm-x3650m4-01 perf]$ sudo ./perf stat --topdown -I 1000 -a nmi_watchdog enabled with topdown. May give wrong results. Disable with echo 0 > /proc/sys/kernel/nmi_watchdog 1.001615409 retiring bad speculation frontend bound backend bound 1.001615409 S0-C0 2 38.3%0.0% 58.4%3.3% 1.001615409 S0-C1 2 38.1% -0.0% 59.3%2.6% 1.001615409 S0-C2 2 38.1%0.0% 58.9%2.9% 1.001615409 S0-C3 2 38.1% -0.0% 58.9%3.0% 1.001615409 S0-C4 2 38.0%0.0% 59.0%2.9% 1.001615409 S0-C5 2 38.1% -0.0% 58.6%3.3% 1.001615409 S1-C0 2 49.7%1.9% 44.7%3.7% thanks, jirka
Add top down metrics to perf stat
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] [v5: Rebased on latest tree. Move debug messages to -vv] [v6: Rebased. Remove .aggr-per-core and --single-thread to not break old perf binaries. Put SMT enumeration into generic topology API.] [v7: Address review comments. Change patch title headers.] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc
Add top down metrics to perf stat
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] [v5: Rebased on latest tree. Move debug messages to -vv] [v6: Rebased. Remove .aggr-per-core and --single-thread to not break old perf binaries. Put SMT enumeration into generic topology API.] [v7: Address review comments. Change patch title headers.] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc
Re: Add top down metrics to perf stat
On Thu, May 05, 2016 at 04:03:57PM -0700, Andi Kleen wrote: SNIP > The kernel declares the events supported by the current > CPU and perf stat then computes the formulas based on the > available metrics. > > > Example output: > > $ perf stat --topdown -I 1000 cmd > 1.000735655 frontend bound retiring > bad speculation backend bound > 1.000735655 S0-C0 247.84% 11.69% > 8.37% 32.10% > 1.000735655 S0-C1 245.53% 11.39% > 8.52% 34.56% > 2.003978563 S0-C0 249.47% 12.22% > 8.65% 29.66% > 2.003978563 S0-C1 247.21% 12.98% > 8.77% 31.04% > 3.004901432 S0-C0 249.35% 12.26% > 8.68% 29.70% > 3.004901432 S0-C1 247.23% 12.67% > 8.76% 31.35% > 4.005766611 S0-C0 248.44% 12.14% > 8.59% 30.82% > 4.005766611 S0-C1 246.07% 12.41% > 8.67% 32.85% > 5.006580592 S0-C0 247.91% 12.08% > 8.57% 31.44% > 5.006580592 S0-C1 245.57% 12.27% > 8.63% 33.53% > 6.007545125 S0-C0 247.45% 12.02% > 8.57% 31.96% > 6.007545125 S0-C1 245.13% 12.17% > 8.57% 34.14% > 7.008539347 S0-C0 247.07% 12.03% > 8.61% 32.29% getting -0% for bad speculation.. im on your perf/top-down-20 thanks, jirka [root@ibm-x3650m4-01 perf]# ./perf stat --topdown -a -I 1000 nmi_watchdog enabled with topdown. May give wrong results. Disable with echo 0 > /proc/sys/kernel/nmi_watchdog 1.002322346 retiring bad speculation frontend bound backend bound 1.002322346 S0-C0 2 38.3%0.0% 57.9%3.8% 1.002322346 S0-C1 2 38.3%0.0% 59.1%2.6% 1.002322346 S0-C2 2 38.3%0.0% 59.0%2.6% 1.002322346 S0-C3 2 38.3%0.0% 58.7%3.0% 1.002322346 S0-C4 2 38.3% -0.0% 58.6%3.1% 1.002322346 S0-C5 2 38.4% -0.0% 58.3%3.3% 1.002322346 S1-C0 2 38.3% -0.0% 58.7%3.0% 1.002322346 S1-C1 2 38.3%0.0% 59.7%2.0% 1.002322346 S1-C2 2 38.3% -0.0% 59.3%2.5% 1.002322346 S1-C3 2 38.3% -0.0% 59.1%2.5% 1.002322346 S1-C4 2 38.3%0.0% 59.1%2.6% 1.002322346 S1-C5 2 38.3% -0.0% 59.1%2.7% 2.005429451 S0-C0 2 38.3%0.0% 57.9%3.8% ...
Re: Add top down metrics to perf stat
On Thu, May 05, 2016 at 04:03:57PM -0700, Andi Kleen wrote: SNIP > The kernel declares the events supported by the current > CPU and perf stat then computes the formulas based on the > available metrics. > > > Example output: > > $ perf stat --topdown -I 1000 cmd > 1.000735655 frontend bound retiring > bad speculation backend bound > 1.000735655 S0-C0 247.84% 11.69% > 8.37% 32.10% > 1.000735655 S0-C1 245.53% 11.39% > 8.52% 34.56% > 2.003978563 S0-C0 249.47% 12.22% > 8.65% 29.66% > 2.003978563 S0-C1 247.21% 12.98% > 8.77% 31.04% > 3.004901432 S0-C0 249.35% 12.26% > 8.68% 29.70% > 3.004901432 S0-C1 247.23% 12.67% > 8.76% 31.35% > 4.005766611 S0-C0 248.44% 12.14% > 8.59% 30.82% > 4.005766611 S0-C1 246.07% 12.41% > 8.67% 32.85% > 5.006580592 S0-C0 247.91% 12.08% > 8.57% 31.44% > 5.006580592 S0-C1 245.57% 12.27% > 8.63% 33.53% > 6.007545125 S0-C0 247.45% 12.02% > 8.57% 31.96% > 6.007545125 S0-C1 245.13% 12.17% > 8.57% 34.14% > 7.008539347 S0-C0 247.07% 12.03% > 8.61% 32.29% getting -0% for bad speculation.. im on your perf/top-down-20 thanks, jirka [root@ibm-x3650m4-01 perf]# ./perf stat --topdown -a -I 1000 nmi_watchdog enabled with topdown. May give wrong results. Disable with echo 0 > /proc/sys/kernel/nmi_watchdog 1.002322346 retiring bad speculation frontend bound backend bound 1.002322346 S0-C0 2 38.3%0.0% 57.9%3.8% 1.002322346 S0-C1 2 38.3%0.0% 59.1%2.6% 1.002322346 S0-C2 2 38.3%0.0% 59.0%2.6% 1.002322346 S0-C3 2 38.3%0.0% 58.7%3.0% 1.002322346 S0-C4 2 38.3% -0.0% 58.6%3.1% 1.002322346 S0-C5 2 38.4% -0.0% 58.3%3.3% 1.002322346 S1-C0 2 38.3% -0.0% 58.7%3.0% 1.002322346 S1-C1 2 38.3%0.0% 59.7%2.0% 1.002322346 S1-C2 2 38.3% -0.0% 59.3%2.5% 1.002322346 S1-C3 2 38.3% -0.0% 59.1%2.5% 1.002322346 S1-C4 2 38.3%0.0% 59.1%2.6% 1.002322346 S1-C5 2 38.3% -0.0% 59.1%2.7% 2.005429451 S0-C0 2 38.3%0.0% 57.9%3.8% ...
Add top down metrics to perf stat
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] [v5: Rebased on latest tree. Move debug messages to -vv] [v6: Rebased. Remove .aggr-per-core and --single-thread to not break old perf binaries. Put SMT enumeration into generic topology API.] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-20
Add top down metrics to perf stat
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] [v5: Rebased on latest tree. Move debug messages to -vv] [v6: Rebased. Remove .aggr-per-core and --single-thread to not break old perf binaries. Put SMT enumeration into generic topology API.] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-20
Add top down metrics to perf stat
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] [v5: Rebased on latest tree. Move debug messages to -vv] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) When Hyper Threading is off this can be overriden with the --single-thread option. When Hyper Threading is on it is enforced, the only way to not require -a here is to off line the logical CPUs of the second threads. One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in
Add top down metrics to perf stat
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] [v5: Rebased on latest tree. Move debug messages to -vv] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) When Hyper Threading is off this can be overriden with the --single-thread option. When Hyper Threading is on it is enforced, the only way to not require -a here is to off line the logical CPUs of the second threads. One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in
Add top down metrics to perf stat
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) When Hyper Threading is off this can be overriden with the --single-thread option. When Hyper Threading is on it is enforced, the only way to not require -a here is to off line the logical CPUs of the second threads. One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-17
Add top down metrics to perf stat
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) When Hyper Threading is off this can be overriden with the --single-thread option. When Hyper Threading is on it is enforced, the only way to not require -a here is to off line the logical CPUs of the second threads. One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-17
Re: Add top down metrics to perf stat
> can't see this one (-16): > > [jolsa@krava perf]$ git remote update ak > Fetching ak > [jolsa@krava perf]$ git branch -r | grep top-down > ak/perf/top-down-10 > ak/perf/top-down-11 > ak/perf/top-down-13 > ak/perf/top-down-2 Please try again, I pushed it again. -Andi
Re: Add top down metrics to perf stat
> can't see this one (-16): > > [jolsa@krava perf]$ git remote update ak > Fetching ak > [jolsa@krava perf]$ git branch -r | grep top-down > ak/perf/top-down-10 > ak/perf/top-down-11 > ak/perf/top-down-13 > ak/perf/top-down-2 Please try again, I pushed it again. -Andi
Re: Add top down metrics to perf stat
On Tue, Mar 22, 2016 at 04:08:46PM -0700, Andi Kleen wrote: SNIP > In this case perf stat automatically enables --per-core mode and also requires > global mode (-a) and avoiding other filters (no cgroup mode) > > When Hyper Threading is off this can be overriden with the --single-thread > option. When Hyper Threading is on it is enforced, the only way to > not require -a here is to off line the logical CPUs of the second > threads. > > One side effect is that this may require root rights or a > kernel.perf_event_paranoid=-1 setting. > > Full tree available in > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16 > can't see this one (-16): [jolsa@krava perf]$ git remote update ak Fetching ak [jolsa@krava perf]$ git branch -r | grep top-down ak/perf/top-down-10 ak/perf/top-down-11 ak/perf/top-down-13 ak/perf/top-down-2 thanks, jirka
Re: Add top down metrics to perf stat
On Tue, Mar 22, 2016 at 04:08:46PM -0700, Andi Kleen wrote: SNIP > In this case perf stat automatically enables --per-core mode and also requires > global mode (-a) and avoiding other filters (no cgroup mode) > > When Hyper Threading is off this can be overriden with the --single-thread > option. When Hyper Threading is on it is enforced, the only way to > not require -a here is to off line the logical CPUs of the second > threads. > > One side effect is that this may require root rights or a > kernel.perf_event_paranoid=-1 setting. > > Full tree available in > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16 > can't see this one (-16): [jolsa@krava perf]$ git remote update ak Fetching ak [jolsa@krava perf]$ git branch -r | grep top-down ak/perf/top-down-10 ak/perf/top-down-11 ak/perf/top-down-13 ak/perf/top-down-2 thanks, jirka
Add top down metrics to perf stat
[v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) When Hyper Threading is off this can be overriden with the --single-thread option. When Hyper Threading is on it is enforced, the only way to not require -a here is to off line the logical CPUs of the second threads. One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16
Add top down metrics to perf stat
[v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 247.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 245.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 249.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 247.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 249.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 247.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 248.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 246.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 247.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 245.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 247.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 245.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 247.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) When Hyper Threading is off this can be overriden with the --single-thread option. When Hyper Threading is on it is enforced, the only way to not require -a here is to off line the logical CPUs of the second threads. One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16
Re: Add top down metrics to perf stat v2
On Fri, Dec 18, 2015 at 01:31:18AM -0800, Stephane Eranian wrote: > >> Why force --per-core when HT is on. I know you you need to aggregate > >> per core, but > >> you could still display globally. And then if user requests > >> --per-core, then display per core. > > > > Global TopDown doesn't make much sense. Suppose you have two programs > > running on different cores, one frontend bound and one backend bound. > > What would the union of the two mean? And you may well end up > > with sums of ratios which are >100%. > > > How could that be if you consider that the machine is N-wide and not just > 4-wide > anymore? > > How what you are describing here is different when HT is off? I was talking about cores, not CPU threads. With global aggregation we would aggregate data from different cores, which is highly dubious for TopDown. CPU threads on a core are of course aggregated, that is why the patchkit forces --per-core with HT on. > If you force --per-core with HT-on, then you need to force it too when > HT is off so that you get a similar per core breakdown. In the HT on > case, each Sx-Cy represents 2 threads, compared to 1 in the non HT > case.Right now, you have non-HT reporting global, HT reporting per-core. > That does not make much sense to me. Ok. I guess can force --per-core in this case too. This would simplify things because can get rid of the agg-per-core attribute. > >> but it would be clearer and simpler to interpret to users. > > > > Same problem as above. > > > >> > >> One bug I found when testing is that if you do with HT-on: > >> > >> $ perf stat -a --topdown -I 1000 --metric-only sleep 100 > >> Then you get data for frontend and backend but nothing for retiring or > >> bad speculation. > > > > You see all the columns, but no data in some? > > > yes, and I don't like that. It is confusing especially when you do not > know the threshold. > Why are you suppressing the 'retiring' data when it is at 25% (1/4 of > the maximum possible) > when I am running a simple noploop? 25% is a sign of underutilization, > that could be useful too. It's what the TopDown specification uses and the paper describes. The thresholds are needed when you have more than one level because the lower levels become meaningless if their parents didn't cross the threshold. Otherwise you may report something that looks like a bottle neck, but isn't. Given there is currently only level 1 in the patchkit, but if we ever add more levels absolutely need thresholds. So it's better to have them from Day 1. Utilization should be reported separately. TopDown cannot give utilization because it doesn't know about idle time. I can report - for empty fields if it helps you. It's not clear to me why empty fields in CSV are a problem. I don't think colors are useful here, this would have the problem described above. > > > That's intended: the percentage is only printed when it crosses a > > threshold. That's part of the top down specification. > > > I don't like that. I would rather see all the percentages. > My remark applies to non topdown metrics as well, such as IPC. > Clearly the IPC is awkward to use. You need to know you need to > measure cycles, instructions to get ipc with --metric-only. Again, Well it's the default (perf stat --metric-only), or with -d*, and it works fine with --transaction too. If you think there should be more predefined sets of metrics that's fine for me too, but it would be a separate patch. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add top down metrics to perf stat v2
Andi, On Thu, Dec 17, 2015 at 5:55 PM, Andi Kleen wrote: > Thanks for testing. > > > On Thu, Dec 17, 2015 at 03:31:30PM -0800, Stephane Eranian wrote: >> I would not add a --topdown option but instead a --metric option with >> arguments >> such that other metrics could be added later: >> >>$ perf stat --metrics topdown -I 1000 -a sleep 100 >> >> If you do this, you do not need the --metric-only option > > The --metric-only option is useful with other metrics too. For example > to get concise (and plottable) IPC or TSX abort statistics. See the > examples in the original commit. > > However could make --topdown default to --metric-only and add > an option to turn it off. Yes that's probably a better default > for more people, although some people could be annoyed by the > wide output. > >> The double --topdown is confusing. > > Ok. I was thinking of changing it and adding an extra argument for > the "ignore threshold" behavior. That would also make it more extensible > if we ever add Level 2. > I think you should drop that implicit threshold data suppression feature altogether. See what I write below. >> >> Why force --per-core when HT is on. I know you you need to aggregate >> per core, but >> you could still display globally. And then if user requests >> --per-core, then display per core. > > Global TopDown doesn't make much sense. Suppose you have two programs > running on different cores, one frontend bound and one backend bound. > What would the union of the two mean? And you may well end up > with sums of ratios which are >100%. > How could that be if you consider that the machine is N-wide and not just 4-wide anymore? How what you are describing here is different when HT is off? If you force --per-core with HT-on, then you need to force it too when HT is off so that you get a similar per core breakdown. In the HT on case, each Sx-Cy represents 2 threads, compared to 1 in the non HT case.Right now, you have non-HT reporting global, HT reporting per-core. That does not make much sense to me. > The only exception where it's useful is for the single threaded > case (like the toplev --single-thread) option. However it is something > ugly and difficult because the user would need to ensure that there is > nothing active on the sibling thread. So I left it out. > >> Same if user specifies --per-socket. I know this requires some more >> plumbing inside perf >> but it would be clearer and simpler to interpret to users. > > Same problem as above. > >> >> One bug I found when testing is that if you do with HT-on: >> >> $ perf stat -a --topdown -I 1000 --metric-only sleep 100 >> Then you get data for frontend and backend but nothing for retiring or >> bad speculation. > > You see all the columns, but no data in some? > yes, and I don't like that. It is confusing especially when you do not know the threshold. Why are you suppressing the 'retiring' data when it is at 25% (1/4 of the maximum possible) when I am running a simple noploop? 25% is a sign of underutilization, that could be useful too. Furthermore, it makes it harder to parse, including with the -x option because some fields may not be there. I would rather see all the values. In non -x mode, you could use color to indicate high/low thresholds (similar to perf report). > That's intended: the percentage is only printed when it crosses a > threshold. That's part of the top down specification. > I don't like that. I would rather see all the percentages. My remark applies to non topdown metrics as well, such as IPC. Clearly the IPC is awkward to use. You need to know you need to measure cycles, instructions to get ipc with --metric-only. Again, I would rather see: $ perf stat --metrics ipc $ perf stat --metrics topdown It is more uniform and users do not have to worry about what events to use to compute a metric. As an example, here is what you could do (showing side by side metrics ipc and uops/cycles): # perf stat -a --metric ipc,upc -I 1000 sleep 100 #== # | ipc|upc # | IPC %Peak| UPC # |^ ^ | ^ #== 1.006038929 0.21 5.16%0.60 2.012169514 0.21 5.31%0.60 3.018314389 0.20 5.08%0.59 4.024430081 0.21 5.26%0.60 >> I suspect it is because you expect --metric-only to be used only when >> you have the >> double --topdown. That's why I think this double topdown is confusing. If >> you do >> as I suggest, it will be much simpler. > > It works fine with single topdown as far as I can tell. > > > -Andi > -- > a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add top down metrics to perf stat v2
Andi, On Thu, Dec 17, 2015 at 5:55 PM, Andi Kleenwrote: > Thanks for testing. > > > On Thu, Dec 17, 2015 at 03:31:30PM -0800, Stephane Eranian wrote: >> I would not add a --topdown option but instead a --metric option with >> arguments >> such that other metrics could be added later: >> >>$ perf stat --metrics topdown -I 1000 -a sleep 100 >> >> If you do this, you do not need the --metric-only option > > The --metric-only option is useful with other metrics too. For example > to get concise (and plottable) IPC or TSX abort statistics. See the > examples in the original commit. > > However could make --topdown default to --metric-only and add > an option to turn it off. Yes that's probably a better default > for more people, although some people could be annoyed by the > wide output. > >> The double --topdown is confusing. > > Ok. I was thinking of changing it and adding an extra argument for > the "ignore threshold" behavior. That would also make it more extensible > if we ever add Level 2. > I think you should drop that implicit threshold data suppression feature altogether. See what I write below. >> >> Why force --per-core when HT is on. I know you you need to aggregate >> per core, but >> you could still display globally. And then if user requests >> --per-core, then display per core. > > Global TopDown doesn't make much sense. Suppose you have two programs > running on different cores, one frontend bound and one backend bound. > What would the union of the two mean? And you may well end up > with sums of ratios which are >100%. > How could that be if you consider that the machine is N-wide and not just 4-wide anymore? How what you are describing here is different when HT is off? If you force --per-core with HT-on, then you need to force it too when HT is off so that you get a similar per core breakdown. In the HT on case, each Sx-Cy represents 2 threads, compared to 1 in the non HT case.Right now, you have non-HT reporting global, HT reporting per-core. That does not make much sense to me. > The only exception where it's useful is for the single threaded > case (like the toplev --single-thread) option. However it is something > ugly and difficult because the user would need to ensure that there is > nothing active on the sibling thread. So I left it out. > >> Same if user specifies --per-socket. I know this requires some more >> plumbing inside perf >> but it would be clearer and simpler to interpret to users. > > Same problem as above. > >> >> One bug I found when testing is that if you do with HT-on: >> >> $ perf stat -a --topdown -I 1000 --metric-only sleep 100 >> Then you get data for frontend and backend but nothing for retiring or >> bad speculation. > > You see all the columns, but no data in some? > yes, and I don't like that. It is confusing especially when you do not know the threshold. Why are you suppressing the 'retiring' data when it is at 25% (1/4 of the maximum possible) when I am running a simple noploop? 25% is a sign of underutilization, that could be useful too. Furthermore, it makes it harder to parse, including with the -x option because some fields may not be there. I would rather see all the values. In non -x mode, you could use color to indicate high/low thresholds (similar to perf report). > That's intended: the percentage is only printed when it crosses a > threshold. That's part of the top down specification. > I don't like that. I would rather see all the percentages. My remark applies to non topdown metrics as well, such as IPC. Clearly the IPC is awkward to use. You need to know you need to measure cycles, instructions to get ipc with --metric-only. Again, I would rather see: $ perf stat --metrics ipc $ perf stat --metrics topdown It is more uniform and users do not have to worry about what events to use to compute a metric. As an example, here is what you could do (showing side by side metrics ipc and uops/cycles): # perf stat -a --metric ipc,upc -I 1000 sleep 100 #== # | ipc|upc # | IPC %Peak| UPC # |^ ^ | ^ #== 1.006038929 0.21 5.16%0.60 2.012169514 0.21 5.31%0.60 3.018314389 0.20 5.08%0.59 4.024430081 0.21 5.26%0.60 >> I suspect it is because you expect --metric-only to be used only when >> you have the >> double --topdown. That's why I think this double topdown is confusing. If >> you do >> as I suggest, it will be much simpler. > > It works fine with single topdown as far as I can tell. > > > -Andi > -- > a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add top down metrics to perf stat v2
On Fri, Dec 18, 2015 at 01:31:18AM -0800, Stephane Eranian wrote: > >> Why force --per-core when HT is on. I know you you need to aggregate > >> per core, but > >> you could still display globally. And then if user requests > >> --per-core, then display per core. > > > > Global TopDown doesn't make much sense. Suppose you have two programs > > running on different cores, one frontend bound and one backend bound. > > What would the union of the two mean? And you may well end up > > with sums of ratios which are >100%. > > > How could that be if you consider that the machine is N-wide and not just > 4-wide > anymore? > > How what you are describing here is different when HT is off? I was talking about cores, not CPU threads. With global aggregation we would aggregate data from different cores, which is highly dubious for TopDown. CPU threads on a core are of course aggregated, that is why the patchkit forces --per-core with HT on. > If you force --per-core with HT-on, then you need to force it too when > HT is off so that you get a similar per core breakdown. In the HT on > case, each Sx-Cy represents 2 threads, compared to 1 in the non HT > case.Right now, you have non-HT reporting global, HT reporting per-core. > That does not make much sense to me. Ok. I guess can force --per-core in this case too. This would simplify things because can get rid of the agg-per-core attribute. > >> but it would be clearer and simpler to interpret to users. > > > > Same problem as above. > > > >> > >> One bug I found when testing is that if you do with HT-on: > >> > >> $ perf stat -a --topdown -I 1000 --metric-only sleep 100 > >> Then you get data for frontend and backend but nothing for retiring or > >> bad speculation. > > > > You see all the columns, but no data in some? > > > yes, and I don't like that. It is confusing especially when you do not > know the threshold. > Why are you suppressing the 'retiring' data when it is at 25% (1/4 of > the maximum possible) > when I am running a simple noploop? 25% is a sign of underutilization, > that could be useful too. It's what the TopDown specification uses and the paper describes. The thresholds are needed when you have more than one level because the lower levels become meaningless if their parents didn't cross the threshold. Otherwise you may report something that looks like a bottle neck, but isn't. Given there is currently only level 1 in the patchkit, but if we ever add more levels absolutely need thresholds. So it's better to have them from Day 1. Utilization should be reported separately. TopDown cannot give utilization because it doesn't know about idle time. I can report - for empty fields if it helps you. It's not clear to me why empty fields in CSV are a problem. I don't think colors are useful here, this would have the problem described above. > > > That's intended: the percentage is only printed when it crosses a > > threshold. That's part of the top down specification. > > > I don't like that. I would rather see all the percentages. > My remark applies to non topdown metrics as well, such as IPC. > Clearly the IPC is awkward to use. You need to know you need to > measure cycles, instructions to get ipc with --metric-only. Again, Well it's the default (perf stat --metric-only), or with -d*, and it works fine with --transaction too. If you think there should be more predefined sets of metrics that's fine for me too, but it would be a separate patch. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add top down metrics to perf stat v2
Thanks for testing. On Thu, Dec 17, 2015 at 03:31:30PM -0800, Stephane Eranian wrote: > I would not add a --topdown option but instead a --metric option with > arguments > such that other metrics could be added later: > >$ perf stat --metrics topdown -I 1000 -a sleep 100 > > If you do this, you do not need the --metric-only option The --metric-only option is useful with other metrics too. For example to get concise (and plottable) IPC or TSX abort statistics. See the examples in the original commit. However could make --topdown default to --metric-only and add an option to turn it off. Yes that's probably a better default for more people, although some people could be annoyed by the wide output. > The double --topdown is confusing. Ok. I was thinking of changing it and adding an extra argument for the "ignore threshold" behavior. That would also make it more extensible if we ever add Level 2. > > Why force --per-core when HT is on. I know you you need to aggregate > per core, but > you could still display globally. And then if user requests > --per-core, then display per core. Global TopDown doesn't make much sense. Suppose you have two programs running on different cores, one frontend bound and one backend bound. What would the union of the two mean? And you may well end up with sums of ratios which are >100%. The only exception where it's useful is for the single threaded case (like the toplev --single-thread) option. However it is something ugly and difficult because the user would need to ensure that there is nothing active on the sibling thread. So I left it out. > Same if user specifies --per-socket. I know this requires some more > plumbing inside perf > but it would be clearer and simpler to interpret to users. Same problem as above. > > One bug I found when testing is that if you do with HT-on: > > $ perf stat -a --topdown -I 1000 --metric-only sleep 100 > Then you get data for frontend and backend but nothing for retiring or > bad speculation. You see all the columns, but no data in some? That's intended: the percentage is only printed when it crosses a threshold. That's part of the top down specificatio. > I suspect it is because you expect --metric-only to be used only when > you have the > double --topdown. That's why I think this double topdown is confusing. If you > do > as I suggest, it will be much simpler. It works fine with single topdown as far as I can tell. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add top down metrics to perf stat v2
On Thu, Dec 17, 2015 at 6:01 AM, Andi Kleen wrote: > On Thu, Dec 17, 2015 at 02:27:58AM -0800, Stephane Eranian wrote: >> > S0-C1 2 4175583320.00 topdown-slots-retired >> > (100.00%) >> > S0-C1 2 1743329246 topdown-recovery-bubbles # >> > 22.22% bad speculation (100.00%) >> > S0-C1 2 6138901193.50 topdown-slots-issued # >> > 46.99% backend bound >> > >> I don't see how this output could be very useful. What matters is the >> percentage in the comments >> and not so much the raw counts because what is the unit? Same remark >> holds for the percentage. >> I think you need to explain or show that this is % of issue slots and >> not cycles. > > The events already say slots, not cycles. Except for recovery-bubbles. Could > add > -slots there too if you think it's helpful, although it would make the > name very long and may not fit into the column anymore. > I would drop the default output, it is not useful. I would not add a --topdown option but instead a --metric option with arguments such that other metrics could be added later: $ perf stat --metrics topdown -I 1000 -a sleep 100 If you do this, you do not need the --metric-only option The double --topdown is confusing. Why force --per-core when HT is on. I know you you need to aggregate per core, but you could still display globally. And then if user requests --per-core, then display per core. Same if user specifies --per-socket. I know this requires some more plumbing inside perf but it would be clearer and simpler to interpret to users. One bug I found when testing is that if you do with HT-on: $ perf stat -a --topdown -I 1000 --metric-only sleep 100 Then you get data for frontend and backend but nothing for retiring or bad speculation. I suspect it is because you expect --metric-only to be used only when you have the double --topdown. That's why I think this double topdown is confusing. If you do as I suggest, it will be much simpler. >> >> >1.535832673 seconds time elapsed >> > >> > $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s >> >> When I tried from your git tree the --metric-only option was not recognized. > > See below. >> >> > 0.100576098 frontend bound retiring bad >> > speculation backend bound >> > 0.100576098 8.83% 48.93% 35.24% >> > 7.00% >> > 0.200800845 8.84% 48.49% 35.53% >> > 7.13% >> > 0.300905983 8.73% 48.64% 35.58% >> > 7.05% >> > ... >> > >> This kind of output is more meaningful and clearer for end-users based >> on my experience >> and you'd like it per-core possibly. > > Yes --metric-only is a lot clearer. > > per-core is supported and automatically enabled with SMT on. > >> > Full tree available in >> > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc >> > perf/top-down-11 >> >> That is in the top-down-2 branch instead, I think. > > Sorry, typo > > The correct branch is perf/top-down-10 > > I also updated it now with the latest review feedback changes. > > top-down-2 is an really old branch that indeed didn't have metric-only. > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add top down metrics to perf stat v2
On Thu, Dec 17, 2015 at 02:27:58AM -0800, Stephane Eranian wrote: > > S0-C1 2 4175583320.00 topdown-slots-retired > > (100.00%) > > S0-C1 2 1743329246 topdown-recovery-bubbles # > > 22.22% bad speculation (100.00%) > > S0-C1 2 6138901193.50 topdown-slots-issued # > > 46.99% backend bound > > > I don't see how this output could be very useful. What matters is the > percentage in the comments > and not so much the raw counts because what is the unit? Same remark > holds for the percentage. > I think you need to explain or show that this is % of issue slots and > not cycles. The events already say slots, not cycles. Except for recovery-bubbles. Could add -slots there too if you think it's helpful, although it would make the name very long and may not fit into the column anymore. > > >1.535832673 seconds time elapsed > > > > $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s > > When I tried from your git tree the --metric-only option was not recognized. See below. > > > 0.100576098 frontend bound retiring bad > > speculation backend bound > > 0.100576098 8.83% 48.93% 35.24% > > 7.00% > > 0.200800845 8.84% 48.49% 35.53% > > 7.13% > > 0.300905983 8.73% 48.64% 35.58% > > 7.05% > > ... > > > This kind of output is more meaningful and clearer for end-users based > on my experience > and you'd like it per-core possibly. Yes --metric-only is a lot clearer. per-core is supported and automatically enabled with SMT on. > > Full tree available in > > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-11 > > That is in the top-down-2 branch instead, I think. Sorry, typo The correct branch is perf/top-down-10 I also updated it now with the latest review feedback changes. top-down-2 is an really old branch that indeed didn't have metric-only. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add top down metrics to perf stat v2
Andi, On Tue, Dec 15, 2015 at 4:54 PM, Andi Kleen wrote: > Note to reviewers: includes both tools and kernel patches. > The kernel patches are at the end. > > This patchkit adds support for TopDown measurements to perf stat > It applies on top of my earlier metrics patchkit, posted > separately, and the --metric-only patchkit (also > posted separately) > > TopDown is intended to replace the frontend cycles idle/ > backend cycles idle metrics in standard perf stat output. > These metrics are not reliable in many workloads, > due to out of order effects. > > This implements a new --topdown mode in perf stat > (similar to --transaction) that measures the pipe line > bottlenecks using standardized formulas. The measurement > can be all done with 5 counters (one fixed counter) > > The result are four metrics: > FrontendBound, BackendBound, BadSpeculation, Retiring > > that describe the CPU pipeline behavior on a high level. > > FrontendBound and BackendBound > BadSpeculation is a higher > > The full top down methology has many hierarchical metrics. > This implementation only supports level 1 which can be > collected without multiplexing. A full implementation > of top down on top of perf is available in pmu-tools toplev. > (http://github.com/andikleen/pmu-tools) > > The current version works on Intel Core CPUs starting > with Sandy Bridge, and Atom CPUs starting with Silvermont. > In principle the generic metrics should be also implementable > on other out of order CPUs. > > TopDown level 1 uses a set of abstracted metrics which > are generic to out of order CPU cores (although some > CPUs may not implement all of them): > > topdown-total-slots Available slots in the pipeline > topdown-slots-issued Slots issued into the pipeline > topdown-slots-retired Slots successfully retired > topdown-fetch-bubbles Pipeline gaps in the frontend > topdown-recovery-bubbles Pipeline gaps during recovery > from misspeculation > > These metrics then allow to compute four useful metrics: > FrontendBound, BackendBound, Retiring, BadSpeculation. > > The formulas to compute the metrics are generic, they > only change based on the availability on the abstracted > input values. > > The kernel declares the events supported by the current > CPU and perf stat then computes the formulas based on the > available metrics. > > > Example output: > > $ ./perf stat --topdown -a ./BC1s > > Performance counter stats for 'system wide': > > S0-C0 2 19650790 topdown-total-slots > (100.00%) > S0-C0 2 4445680.00 topdown-fetch-bubbles # > 22.62% frontend bound (100.00%) > S0-C0 2 1743552.00 topdown-slots-retired > (100.00%) > S0-C0 2 622954 topdown-recovery-bubbles > (100.00%) > S0-C0 2 2025498.00 topdown-slots-issued # > 63.90% backend bound > S0-C1 216685216540 topdown-total-slots > (100.00%) > S0-C1 2 962557931.00 topdown-fetch-bubbles > (100.00%) > S0-C1 2 4175583320.00 topdown-slots-retired > (100.00%) > S0-C1 2 1743329246 topdown-recovery-bubbles # > 22.22% bad speculation (100.00%) > S0-C1 2 6138901193.50 topdown-slots-issued # > 46.99% backend bound > I don't see how this output could be very useful. What matters is the percentage in the comments and not so much the raw counts because what is the unit? Same remark holds for the percentage. I think you need to explain or show that this is % of issue slots and not cycles. >1.535832673 seconds time elapsed > > $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s When I tried from your git tree the --metric-only option was not recognized. > 0.100576098 frontend bound retiring bad > speculation backend bound > 0.100576098 8.83% 48.93% 35.24% >7.00% > 0.200800845 8.84% 48.49% 35.53% >7.13% > 0.300905983 8.73% 48.64% 35.58% >7.05% > ... > This kind of output is more meaningful and clearer for end-users based on my experience and you'd like it per-core possibly. > > > On Hyper Threaded CPUs Top Down computes metrics per core instead of per > logical CPU. > In this case perf stat automatically enables --per-core mode and also requires > global mode (-a) and avoiding other filters (no cgroup mode) > > One side effect is that this may require root rights or a > kernel.perf_event_paranoid=-1
Re: Add top down metrics to perf stat v2
Andi, On Tue, Dec 15, 2015 at 4:54 PM, Andi Kleenwrote: > Note to reviewers: includes both tools and kernel patches. > The kernel patches are at the end. > > This patchkit adds support for TopDown measurements to perf stat > It applies on top of my earlier metrics patchkit, posted > separately, and the --metric-only patchkit (also > posted separately) > > TopDown is intended to replace the frontend cycles idle/ > backend cycles idle metrics in standard perf stat output. > These metrics are not reliable in many workloads, > due to out of order effects. > > This implements a new --topdown mode in perf stat > (similar to --transaction) that measures the pipe line > bottlenecks using standardized formulas. The measurement > can be all done with 5 counters (one fixed counter) > > The result are four metrics: > FrontendBound, BackendBound, BadSpeculation, Retiring > > that describe the CPU pipeline behavior on a high level. > > FrontendBound and BackendBound > BadSpeculation is a higher > > The full top down methology has many hierarchical metrics. > This implementation only supports level 1 which can be > collected without multiplexing. A full implementation > of top down on top of perf is available in pmu-tools toplev. > (http://github.com/andikleen/pmu-tools) > > The current version works on Intel Core CPUs starting > with Sandy Bridge, and Atom CPUs starting with Silvermont. > In principle the generic metrics should be also implementable > on other out of order CPUs. > > TopDown level 1 uses a set of abstracted metrics which > are generic to out of order CPU cores (although some > CPUs may not implement all of them): > > topdown-total-slots Available slots in the pipeline > topdown-slots-issued Slots issued into the pipeline > topdown-slots-retired Slots successfully retired > topdown-fetch-bubbles Pipeline gaps in the frontend > topdown-recovery-bubbles Pipeline gaps during recovery > from misspeculation > > These metrics then allow to compute four useful metrics: > FrontendBound, BackendBound, Retiring, BadSpeculation. > > The formulas to compute the metrics are generic, they > only change based on the availability on the abstracted > input values. > > The kernel declares the events supported by the current > CPU and perf stat then computes the formulas based on the > available metrics. > > > Example output: > > $ ./perf stat --topdown -a ./BC1s > > Performance counter stats for 'system wide': > > S0-C0 2 19650790 topdown-total-slots > (100.00%) > S0-C0 2 4445680.00 topdown-fetch-bubbles # > 22.62% frontend bound (100.00%) > S0-C0 2 1743552.00 topdown-slots-retired > (100.00%) > S0-C0 2 622954 topdown-recovery-bubbles > (100.00%) > S0-C0 2 2025498.00 topdown-slots-issued # > 63.90% backend bound > S0-C1 216685216540 topdown-total-slots > (100.00%) > S0-C1 2 962557931.00 topdown-fetch-bubbles > (100.00%) > S0-C1 2 4175583320.00 topdown-slots-retired > (100.00%) > S0-C1 2 1743329246 topdown-recovery-bubbles # > 22.22% bad speculation (100.00%) > S0-C1 2 6138901193.50 topdown-slots-issued # > 46.99% backend bound > I don't see how this output could be very useful. What matters is the percentage in the comments and not so much the raw counts because what is the unit? Same remark holds for the percentage. I think you need to explain or show that this is % of issue slots and not cycles. >1.535832673 seconds time elapsed > > $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s When I tried from your git tree the --metric-only option was not recognized. > 0.100576098 frontend bound retiring bad > speculation backend bound > 0.100576098 8.83% 48.93% 35.24% >7.00% > 0.200800845 8.84% 48.49% 35.53% >7.13% > 0.300905983 8.73% 48.64% 35.58% >7.05% > ... > This kind of output is more meaningful and clearer for end-users based on my experience and you'd like it per-core possibly. > > > On Hyper Threaded CPUs Top Down computes metrics per core instead of per > logical CPU. > In this case perf stat automatically enables --per-core mode and also requires > global mode (-a) and avoiding other filters (no cgroup mode) > > One side effect is that this may require root rights or a >
Re: Add top down metrics to perf stat v2
On Thu, Dec 17, 2015 at 02:27:58AM -0800, Stephane Eranian wrote: > > S0-C1 2 4175583320.00 topdown-slots-retired > > (100.00%) > > S0-C1 2 1743329246 topdown-recovery-bubbles # > > 22.22% bad speculation (100.00%) > > S0-C1 2 6138901193.50 topdown-slots-issued # > > 46.99% backend bound > > > I don't see how this output could be very useful. What matters is the > percentage in the comments > and not so much the raw counts because what is the unit? Same remark > holds for the percentage. > I think you need to explain or show that this is % of issue slots and > not cycles. The events already say slots, not cycles. Except for recovery-bubbles. Could add -slots there too if you think it's helpful, although it would make the name very long and may not fit into the column anymore. > > >1.535832673 seconds time elapsed > > > > $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s > > When I tried from your git tree the --metric-only option was not recognized. See below. > > > 0.100576098 frontend bound retiring bad > > speculation backend bound > > 0.100576098 8.83% 48.93% 35.24% > > 7.00% > > 0.200800845 8.84% 48.49% 35.53% > > 7.13% > > 0.300905983 8.73% 48.64% 35.58% > > 7.05% > > ... > > > This kind of output is more meaningful and clearer for end-users based > on my experience > and you'd like it per-core possibly. Yes --metric-only is a lot clearer. per-core is supported and automatically enabled with SMT on. > > Full tree available in > > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-11 > > That is in the top-down-2 branch instead, I think. Sorry, typo The correct branch is perf/top-down-10 I also updated it now with the latest review feedback changes. top-down-2 is an really old branch that indeed didn't have metric-only. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add top down metrics to perf stat v2
On Thu, Dec 17, 2015 at 6:01 AM, Andi Kleenwrote: > On Thu, Dec 17, 2015 at 02:27:58AM -0800, Stephane Eranian wrote: >> > S0-C1 2 4175583320.00 topdown-slots-retired >> > (100.00%) >> > S0-C1 2 1743329246 topdown-recovery-bubbles # >> > 22.22% bad speculation (100.00%) >> > S0-C1 2 6138901193.50 topdown-slots-issued # >> > 46.99% backend bound >> > >> I don't see how this output could be very useful. What matters is the >> percentage in the comments >> and not so much the raw counts because what is the unit? Same remark >> holds for the percentage. >> I think you need to explain or show that this is % of issue slots and >> not cycles. > > The events already say slots, not cycles. Except for recovery-bubbles. Could > add > -slots there too if you think it's helpful, although it would make the > name very long and may not fit into the column anymore. > I would drop the default output, it is not useful. I would not add a --topdown option but instead a --metric option with arguments such that other metrics could be added later: $ perf stat --metrics topdown -I 1000 -a sleep 100 If you do this, you do not need the --metric-only option The double --topdown is confusing. Why force --per-core when HT is on. I know you you need to aggregate per core, but you could still display globally. And then if user requests --per-core, then display per core. Same if user specifies --per-socket. I know this requires some more plumbing inside perf but it would be clearer and simpler to interpret to users. One bug I found when testing is that if you do with HT-on: $ perf stat -a --topdown -I 1000 --metric-only sleep 100 Then you get data for frontend and backend but nothing for retiring or bad speculation. I suspect it is because you expect --metric-only to be used only when you have the double --topdown. That's why I think this double topdown is confusing. If you do as I suggest, it will be much simpler. >> >> >1.535832673 seconds time elapsed >> > >> > $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s >> >> When I tried from your git tree the --metric-only option was not recognized. > > See below. >> >> > 0.100576098 frontend bound retiring bad >> > speculation backend bound >> > 0.100576098 8.83% 48.93% 35.24% >> > 7.00% >> > 0.200800845 8.84% 48.49% 35.53% >> > 7.13% >> > 0.300905983 8.73% 48.64% 35.58% >> > 7.05% >> > ... >> > >> This kind of output is more meaningful and clearer for end-users based >> on my experience >> and you'd like it per-core possibly. > > Yes --metric-only is a lot clearer. > > per-core is supported and automatically enabled with SMT on. > >> > Full tree available in >> > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc >> > perf/top-down-11 >> >> That is in the top-down-2 branch instead, I think. > > Sorry, typo > > The correct branch is perf/top-down-10 > > I also updated it now with the latest review feedback changes. > > top-down-2 is an really old branch that indeed didn't have metric-only. > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add top down metrics to perf stat v2
Thanks for testing. On Thu, Dec 17, 2015 at 03:31:30PM -0800, Stephane Eranian wrote: > I would not add a --topdown option but instead a --metric option with > arguments > such that other metrics could be added later: > >$ perf stat --metrics topdown -I 1000 -a sleep 100 > > If you do this, you do not need the --metric-only option The --metric-only option is useful with other metrics too. For example to get concise (and plottable) IPC or TSX abort statistics. See the examples in the original commit. However could make --topdown default to --metric-only and add an option to turn it off. Yes that's probably a better default for more people, although some people could be annoyed by the wide output. > The double --topdown is confusing. Ok. I was thinking of changing it and adding an extra argument for the "ignore threshold" behavior. That would also make it more extensible if we ever add Level 2. > > Why force --per-core when HT is on. I know you you need to aggregate > per core, but > you could still display globally. And then if user requests > --per-core, then display per core. Global TopDown doesn't make much sense. Suppose you have two programs running on different cores, one frontend bound and one backend bound. What would the union of the two mean? And you may well end up with sums of ratios which are >100%. The only exception where it's useful is for the single threaded case (like the toplev --single-thread) option. However it is something ugly and difficult because the user would need to ensure that there is nothing active on the sibling thread. So I left it out. > Same if user specifies --per-socket. I know this requires some more > plumbing inside perf > but it would be clearer and simpler to interpret to users. Same problem as above. > > One bug I found when testing is that if you do with HT-on: > > $ perf stat -a --topdown -I 1000 --metric-only sleep 100 > Then you get data for frontend and backend but nothing for retiring or > bad speculation. You see all the columns, but no data in some? That's intended: the percentage is only printed when it crosses a threshold. That's part of the top down specificatio. > I suspect it is because you expect --metric-only to be used only when > you have the > double --topdown. That's why I think this double topdown is confusing. If you > do > as I suggest, it will be much simpler. It works fine with single topdown as far as I can tell. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Add top down metrics to perf stat v2
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the end. This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately, and the --metric-only patchkit (also posted separately) TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ ./perf stat --topdown -a ./BC1s Performance counter stats for 'system wide': S0-C0 2 19650790 topdown-total-slots (100.00%) S0-C0 2 4445680.00 topdown-fetch-bubbles #22.62% frontend bound (100.00%) S0-C0 2 1743552.00 topdown-slots-retired (100.00%) S0-C0 2 622954 topdown-recovery-bubbles (100.00%) S0-C0 2 2025498.00 topdown-slots-issued #63.90% backend bound S0-C1 216685216540 topdown-total-slots (100.00%) S0-C1 2 962557931.00 topdown-fetch-bubbles (100.00%) S0-C1 2 4175583320.00 topdown-slots-retired (100.00%) S0-C1 2 1743329246 topdown-recovery-bubbles #22.22% bad speculation (100.00%) S0-C1 2 6138901193.50 topdown-slots-issued #46.99% backend bound 1.535832673 seconds time elapsed $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s 0.100576098 frontend bound retiring bad speculation backend bound 0.100576098 8.83% 48.93% 35.24% 7.00% 0.200800845 8.84% 48.49% 35.53% 7.13% 0.300905983 8.73% 48.64% 35.58% 7.05% ... On Hyper Threaded CPUs Top Down computes metrics per core instead of per logical CPU. In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. On systems without Hyper Threading it can be used per process. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-11 No changelog against previous version. There were lots of changes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Add top down metrics to perf stat v2
Note to reviewers: includes both tools and kernel patches. The kernel patches are at the end. This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately, and the --metric-only patchkit (also posted separately) TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ ./perf stat --topdown -a ./BC1s Performance counter stats for 'system wide': S0-C0 2 19650790 topdown-total-slots (100.00%) S0-C0 2 4445680.00 topdown-fetch-bubbles #22.62% frontend bound (100.00%) S0-C0 2 1743552.00 topdown-slots-retired (100.00%) S0-C0 2 622954 topdown-recovery-bubbles (100.00%) S0-C0 2 2025498.00 topdown-slots-issued #63.90% backend bound S0-C1 216685216540 topdown-total-slots (100.00%) S0-C1 2 962557931.00 topdown-fetch-bubbles (100.00%) S0-C1 2 4175583320.00 topdown-slots-retired (100.00%) S0-C1 2 1743329246 topdown-recovery-bubbles #22.22% bad speculation (100.00%) S0-C1 2 6138901193.50 topdown-slots-issued #46.99% backend bound 1.535832673 seconds time elapsed $ perf stat --topdown --topdown --metric-only -I 100 ./BC1s 0.100576098 frontend bound retiring bad speculation backend bound 0.100576098 8.83% 48.93% 35.24% 7.00% 0.200800845 8.84% 48.49% 35.53% 7.13% 0.300905983 8.73% 48.64% 35.58% 7.05% ... On Hyper Threaded CPUs Top Down computes metrics per core instead of per logical CPU. In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. On systems without Hyper Threading it can be used per process. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-11 No changelog against previous version. There were lots of changes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Add top down metrics to perf stat
This patchkit adds support for TopDown to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ ./perf stat --topdown -a ./BC1s Performance counter stats for 'system wide': S0-C0 2 19650790 topdown-total-slots (100.00%) S0-C0 2 4445680.00 topdown-fetch-bubbles #22.62% frontend bound (100.00%) S0-C0 2 1743552.00 topdown-slots-retired (100.00%) S0-C0 2 622954 topdown-recovery-bubbles (100.00%) S0-C0 2 2025498.00 topdown-slots-issued #63.90% backend bound S0-C1 216685216540 topdown-total-slots (100.00%) S0-C1 2 962557931.00 topdown-fetch-bubbles (100.00%) S0-C1 2 4175583320.00 topdown-slots-retired (100.00%) S0-C1 2 1743329246 topdown-recovery-bubbles #22.22% bad speculation (100.00%) S0-C1 2 6138901193.50 topdown-slots-issued #46.99% backend bound 1.535832673 seconds time elapsed On Hyper Threaded CPUs Top Down computes metrics per core instead of per logical CPU. In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. On systems without Hyper Threading it can be used per process. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Add top down metrics to perf stat
This patchkit adds support for TopDown to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ ./perf stat --topdown -a ./BC1s Performance counter stats for 'system wide': S0-C0 2 19650790 topdown-total-slots (100.00%) S0-C0 2 4445680.00 topdown-fetch-bubbles #22.62% frontend bound (100.00%) S0-C0 2 1743552.00 topdown-slots-retired (100.00%) S0-C0 2 622954 topdown-recovery-bubbles (100.00%) S0-C0 2 2025498.00 topdown-slots-issued #63.90% backend bound S0-C1 216685216540 topdown-total-slots (100.00%) S0-C1 2 962557931.00 topdown-fetch-bubbles (100.00%) S0-C1 2 4175583320.00 topdown-slots-retired (100.00%) S0-C1 2 1743329246 topdown-recovery-bubbles #22.22% bad speculation (100.00%) S0-C1 2 6138901193.50 topdown-slots-issued #46.99% backend bound 1.535832673 seconds time elapsed On Hyper Threaded CPUs Top Down computes metrics per core instead of per logical CPU. In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. On systems without Hyper Threading it can be used per process. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/