The following commit has been merged into the perf/core branch of tip:
Commit-ID: 0813c40556fce1eeefb996e020cc5339e0b84137
Gitweb:
https://git.kernel.org/tip/0813c40556fce1eeefb996e020cc5339e0b84137
Author:Kan Liang
AuthorDate:Fri, 01 May 2020 05:54:42 -07:00
Committer
The following commit has been merged into the perf/core branch of tip:
Commit-ID: f649fc2eefdef7a67698a3c584222c5c8c5a6785
Gitweb:
https://git.kernel.org/tip/f649fc2eefdef7a67698a3c584222c5c8c5a6785
Author:Kan Liang
AuthorDate:Thu, 07 May 2020 06:14:18 -07:00
Committer
From: Kan Liang
Enable RAPL support for Intel Ice Lake X and Ice Lake D.
For RAPL support, it is identical to Sky Lake X.
Reported-by: Stephane Eranian
Signed-off-by: Kan Liang
---
arch/x86/events/intel/rapl.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/events/intel
From: Kan Liang
The mask in the extra_regs for Intel Tremont need to be extended to
allow more defined bits.
"Outstanding Requests" (bit 63) is only available on MSR_OFFCORE_RSP0;
Fixes: 6daeb8737f8a ("perf/x86/intel: Add Tremont core PMU support")
Reported-by: Stephane
From: Kan Liang
A new branch sample type was introduced to require the LBR Top-of-Stack
(TOS) information.
For non-adaptive PEBS and non-PEBS, the TOS information can be directly
retrieved from TOS MSR read in intel_pmu_lbr_read().
For adaptive PEBS, the LBR information stored in PEBS record
From: Kan Liang
To stitch LBR call stack, the max LBR information is required. So the
CPU PMU capabilities information has to be stored in perf header.
Add a new feature HEADER_CPU_PMU_CAPS for CPU PMU capabilities.
Retrieve all CPU PMU capabilities, not just max LBR information.
Add variable
From: Kan Liang
LBR only collect the user call stack. To reconstruct a call stack, both
kernel call stack and user call stack are required. The function
resolve_lbr_callchain_sample() mix the kernel call stack and user
call stack. Now, with the help of TOS, perf tool can reconstruct a more
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
The perf.data may be generated by a newer version of perf tool,
which support new input bits in attr, e.g. new bit for
branch_sample_type.
The perf.data may be parsed by an older version of perf tool later.
The old perf tool may parse the perf.data incorrectly
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
Support new branch sample type for LBR TOS.
Enable LBR_TOS by default in LBR call stack mode.
If kernel doesn't support the sample type, switching it off.
Add a new branch options "tos" for the new branch sample type.
Set tos to -1ULL if the LBR TOS information is u
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
In LBR call stack mode, the depth of reconstructed LBR call stack limits
to the number of LBR registers.
For example, on skylake, the depth of reconstructed LBR call stack is
always <= 32.
# To display the perf.data header info, please use
# --header/--header-o
From: Kan Liang
In LBR call stack mode, the depth of reconstructed LBR call stack limits
to the number of LBR registers. With LBR Top-of-Stack (TOS) information,
perf tool may stitch the stacks of two samples. The reconstructed LBR
call stack can break the HW limitation.
Add a new branch sample
From: Kan Liang
The PMU capabilities information, which is located at
/sys/bus/event_source/devices//caps, is required by perf tool.
For example, the max LBR information is required to stitch LBR call
stack.
Add perf_pmu__caps_parse() to parse the PMU capabilities information.
The information
From: Kan Liang
Perf checks the duplicate entries in a callchain before adding an entry.
However the check is very slow especially with deeper call stack.
Almost ~50% elapsed time of perf report is spent on the check when the
call stack is always depth of 32.
The hist_entry__cmp() is used
From: Kan Liang
Changes since V2
- Move tos into struct perf_branch_stack
Changes since V1
- Add a new branch sample type for LBR TOS. Drop the sample type in V1.
- Add check in perf header to detect unknown input bits in event attr
- Save and use the LBR cursor nodes from previous sample
From: Kan Liang
Support new branch sample type for LBR TOS.
Enable LBR_TOS by default in LBR call stack mode.
If kernel doesn't support the sample type, switching it off.
Add a new branch options "tos" for the new branch sample type.
Set tos to -1ULL if the LBR TOS information is u
From: Kan Liang
Changes since V1
- Add a new branch sample type for LBR TOS. Drop the sample type in V1.
- Add check in perf header to detect unknown input bits in event attr
- Save and use the LBR cursor nodes from previous sample to avoid
duplicate calculation of cursor nodes.
- Add fast
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
LBR only collect the user call stack. To reconstruct a call stack, both
kernel call stack and user call stack are required. The function
resolve_lbr_callchain_sample() mix the kernel call stack and user
call stack. Now, with the help of TOS, perf tool can reconstruct a more
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
The perf.data may be generated by a newer version of perf tool,
which support new input bits in attr, e.g. new bit for
branch_sample_type.
The perf.data may be parsed by an older version of perf tool later.
The old perf tool may parse the perf.data incorrectly
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
The PMU capabilities information, which is located at
/sys/bus/event_source/devices//caps, is required by perf tool.
For example, the max LBR information is required to stitch LBR call
stack.
Add perf_pmu__caps_parse() to parse the PMU capabilities information.
The information
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
In LBR call stack mode, the depth of reconstructed LBR call stack limits
to the number of LBR registers.
For example, on skylake, the depth of reconstructed LBR call stack is
always <= 32.
# To display the perf.data header info, please use
# --header/--header-o
From: Kan Liang
To stitch LBR call stack, the max LBR information is required. So the
CPU PMU capabilities information has to be stored in perf header.
Add a new feature HEADER_CPU_PMU_CAPS for CPU PMU capabilities.
Retrieve all CPU PMU capabilities, not just max LBR information.
Add variable
From: Kan Liang
Perf checks the duplicate entries in a callchain before adding an entry.
However the check is very slow especially with deeper call stack.
Almost ~50% elapsed time of perf report is spent on the check when the
call stack is always depth of 32.
The hist_entry__cmp() is used
From: Kan Liang
In LBR call stack mode, the depth of reconstructed LBR call stack limits
to the number of LBR registers. With LBR Top-of-Stack (TOS) information,
perf tool may stitch the stacks of two samples. The reconstructed LBR
call stack can break the HW limitation.
Add a new branch sample
From: Kan Liang
A new branch sample type was introduced to require the LBR Top-of-Stack
(TOS) information.
For non-adaptive PEBS and non-PEBS, the TOS information can be directly
retrieved from TOS MSR read in intel_pmu_lbr_read().
For adaptive PEBS, the LBR information stored in PEBS record
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 9066288b2aab1804dc1eebec6ff88474363b89cb
Gitweb:
https://git.kernel.org/tip/9066288b2aab1804dc1eebec6ff88474363b89cb
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:03 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 0917b95079af82c69d8f5bab301faeebcd2cb3cd
Gitweb:
https://git.kernel.org/tip/0917b95079af82c69d8f5bab301faeebcd2cb3cd
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:09 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 9674b1cc0f94c34f76e58c102623a866836f269e
Gitweb:
https://git.kernel.org/tip/9674b1cc0f94c34f76e58c102623a866836f269e
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:04 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 23645a76ba816652d6898def2ee69c6a6250c9b1
Gitweb:
https://git.kernel.org/tip/23645a76ba816652d6898def2ee69c6a6250c9b1
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:08 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 1a5da78d00ce0152994946debd1417513dc35eb3
Gitweb:
https://git.kernel.org/tip/1a5da78d00ce0152994946debd1417513dc35eb3
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:06 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: f1857a2467755e5faa3c727d7146b6db960abee1
Gitweb:
https://git.kernel.org/tip/f1857a2467755e5faa3c727d7146b6db960abee1
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:07 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 1ffa6c04dae39776a3c222bdf88051e394386c01
Gitweb:
https://git.kernel.org/tip/1ffa6c04dae39776a3c222bdf88051e394386c01
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:05 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 52e92f409dede388b7dc3ee13491fbf7a80db935
Gitweb:
https://git.kernel.org/tip/52e92f409dede388b7dc3ee13491fbf7a80db935
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:10 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 1d4d9a6e37ebe8e4ffc3abfcdd24988e7f89df4a
Gitweb:
https://git.kernel.org/tip/1d4d9a6e37ebe8e4ffc3abfcdd24988e7f89df4a
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:05 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: b01a8e2edb924feee1b66f74df1198788fc37cca
Gitweb:
https://git.kernel.org/tip/b01a8e2edb924feee1b66f74df1198788fc37cca
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:09 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 5e715e1121340369e10aa14c6d498a1928c304bb
Gitweb:
https://git.kernel.org/tip/5e715e1121340369e10aa14c6d498a1928c304bb
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:10 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 4f0ce17d816a53326947b085bd755d8c1b9b05fb
Gitweb:
https://git.kernel.org/tip/4f0ce17d816a53326947b085bd755d8c1b9b05fb
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:06 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 0dcbd5393eae6915a85cb0079a90ec3dc89c455f
Gitweb:
https://git.kernel.org/tip/0dcbd5393eae6915a85cb0079a90ec3dc89c455f
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:04 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 374f26643b3ce2bfab02053e292f16adf6e57aa1
Gitweb:
https://git.kernel.org/tip/374f26643b3ce2bfab02053e292f16adf6e57aa1
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:07 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: a53ad0305c1f25edf63db8ae2a9a0289af8d73d4
Gitweb:
https://git.kernel.org/tip/a53ad0305c1f25edf63db8ae2a9a0289af8d73d4
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:03 -07:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 3fefafb17502e2483abe190d11b1778a1f202d70
Gitweb:
https://git.kernel.org/tip/3fefafb17502e2483abe190d11b1778a1f202d70
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:08 -07:00
Committer
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 8d7c6ac3b2371eb1cbc9925a88f4d10efff374de
Gitweb:
https://git.kernel.org/tip/8d7c6ac3b2371eb1cbc9925a88f4d10efff374de
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:02 -07:00
Committer
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 8d7c6ac3b2371eb1cbc9925a88f4d10efff374de
Gitweb:
https://git.kernel.org/tip/8d7c6ac3b2371eb1cbc9925a88f4d10efff374de
Author:Kan Liang
AuthorDate:Tue, 08 Oct 2019 08:50:02 -07:00
Committer
From: Kan Liang
There is no Core C3 C-State counter for Ice Lake.
Package C8/C9/C10 C-State counters are added for Ice Lake.
Introduce a new event list, icl_cstates, for Ice Lake.
Update the comments accordingly.
Fixes: f08c47d1f86c ("perf/x86/intel/cstate: Add Icelake support")
From: Kan Liang
Tiger Lake is the followon to Ice Lake. From the perspective of Intel
cstate residency counters, there is nothing changed compared with
Ice Lake.
Share icl_cstates with Ice Lake.
Update the comments for Tiger Lake.
The External Design Specification (EDS) is not published yet
From: Kan Liang
Tiger Lake is the followon to Ice Lake. From the perspective of Intel
core PMU, there is little changes compared with Ice Lake, e.g. small
changes in event list. But it doesn't impact on core PMU functionality.
Share the perf code with Ice Lake. The event list patch
From: Kan Liang
Comet Lake is the new 10th Gen Intel processor.
Add CPU model number for Comet Lake to the Intel family list.
The CPU model number is not published in SDM yet. It comes
from an authoritative internal source.
Signed-off-by: Kan Liang
Reviewed-by: Tony Luck
Cc: Peter Zijlstra
From: Kan Liang
PPERF and SMI_COUNT MSRs are also supported by Ice Lake desktop and
server.
Signed-off-by: Kan Liang
---
arch/x86/events/msr.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index c177bbe..8515512 100644
--- a/arch/x86
From: Kan Liang
Comet Lake is the new 10th Gen Intel processor. From the perspective
of Intel PMU, there is nothing changed compared with Sky Lake.
Share the perf code with Sky Lake.
The patch has been tested on real hardware.
Signed-off-by: Kan Liang
---
arch/x86/events/intel/core.c | 2
From: Kan Liang
Comet Lake is the new 10th Gen Intel processor. PPERF and SMI_COUNT MSRs
are also supported.
The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.
The patch has been tested on real hardware.
Signed-off-by: Kan Liang
From: Kan Liang
Comet Lake is the new 10th Gen Intel processor. From the perspective of
Intel cstate residency counters, there is nothing changed compared with
Kaby Lake.
Share hswult_cstates with Kaby Lake.
Update the comments for Comet Lake.
Kaby Lake is missed in the comments for some
From: Kan Liang
Tiger Lake is the followon to Ice Lake. PPERF and SMI_COUNT MSRs are
also supported.
The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.
The patch has been tested on real hardware.
Signed-off-by: Kan Liang
---
arch
From: Kan Liang
Comet Lake is the new 10th Gen Intel processor. Add Comet Lake to Intel family.
>From the perspective of Intel core PMU, there is nothing changed compared with
Sky Lake. Share the perf code with Sky Lake.
Add support for perf msr and cstate driver as well.
Tiger L
From: Kan Liang
Support new sample type PERF_SAMPLE_LBR_TOS.
Enable LBR_TOS by default in LBR call stack mode.
If kernel doesn't support the sample type, switching it off.
Reviewed-by: Andi Kleen
Signed-off-by: Kan Liang
---
tools/include/uapi/linux/perf_event.h | 4 +++-
tools/perf
From: Kan Liang
The PMU capabilities information, which is located at
/sys/bus/event_source/devices//caps, is required by perf tool.
For example, the max LBR information is required to stitch LBR call
stack.
Add perf_pmu__caps_parse() to parse the PMU capabilities information.
The information
From: Kan Liang
LBR only collect the user call stack. To reconstruct a call stack, both
kernel call stack and user call stack are required. The function
resolve_lbr_callchain_sample() mix the kernel call stack and user
call stack. Now, with the help of TOS, perf tool can reconstruct a more
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number
From: Kan Liang
In LBR call stack mode, the depth of reconstructed LBR call stack limits
to the number of LBR registers. With LBR Top-of-Stack (TOS) information,
perf tool may stitch the stacks of two samples. The reconstructed LBR
call stack can break the HW limitation.
Add a new sample type
From: Kan Liang
Start from Haswell, Linux perf can utilize the existing Last Branch
Record (LBR) facility to record call stack. However, the depth of the
reconstructed LBR call stack limits to the number of LBR registers.
E.g. on skylake, the depth of reconstructed LBR call stack is <= 32
Tha
From: Kan Liang
In LBR call stack mode, the depth of reconstructed LBR call stack limits
to the number of LBR registers.
For example, on skylake, the depth of reconstructed LBR call stack is
always <= 32.
# To display the perf.data header info, please use
# --header/--header-o
From: Kan Liang
To stitch LBR call stack, the max LBR information is required. So the
CPU PMU capabilities information has to be stored in perf header.
Add a new feature HEADER_CPU_PMU_CAPS for CPU PMU capabilities.
Retrieve all CPU PMU capabilities, not just max LBR information.
Add variable
From: Andi Kleen
Add some documentation how to use the topdown metrics in ring 3.
Signed-off-by: Andi Kleen
Signed-off-by: Kan Liang
---
No changes since V3
tools/perf/Documentation/topdown.txt | 223 +++
1 file changed, 223 insertions(+)
create mode 100644 tools
--topdown to handle these new metrics and
print them in the same way as the previous TopDown metrics.
The restrictions of only being able to report information per core is
gone.
Signed-off-by: Andi Kleen
Signed-off-by: Kan Liang
---
No changes since V3
tools/perf/Documentation/perf-sta
From: Kan Liang
Intro
=
Icelake has support for measuring the four top level TopDown metrics
directly in hardware. This is implemented by an additional "metrics"
register, and a new Fixed Counter 3 that measures pipeline "slots".
Events
==
We export four metric eve
From: Kan Liang
TOPDOWN.SLOTS(0x0400) is not a generic event. It is only available on
fixed counter3.
Don't extend its mask to generic counters.
Signed-off-by: Kan Liang
---
Changes since V3:
- Separate fixed counter3 definition patch
arch/x86/events/intel/core.c | 6 --
1 file changed
From: Kan Liang
The bit index number of global status is directly used in current NMI
handler. Using a meaningful name to replace the number to improve the
readability of code.
Signed-off-by: Kan Liang
---
New patch for V4
arch/x86/events/intel/core.c | 6 +++---
arch/x86/include/asm
From: Kan Liang
Export new TopDown metrics events for perf that map to the sub metrics
in the metrics register, and another for the new slots fixed counter.
This makes the new fixed counters in Icelake visible to the perf
user tools.
Originally-by: Andi Kleen
Signed-off-by: Kan Liang
From: Kan Liang
With Icelake CPUs, the TopDown metrics are directly available as fixed
counters and do not require generic counters, which make it possible to
measure TopDown per thread/process instead of only per core.
The metrics and slots values have to be saved/restored during context
From: Kan Liang
The slots event supports sampling. Users may sampling read slots and
metrics events, e.g perf record -e '{slots, topdown-retiring}:S'.
But the metrics event will reset the fixed counter 3 which will impact
the sampling of the slots event.
Add specific validate_group() support
From: "Peter Zijlstra (Intel)"
On x86_64 we can do a u64 * u64 -> u128 widening multiply followed by
a u128 / u64 -> u64 division to implement a sane version of
mul_u64_u32_div().
Signed-off-by: Peter Zijlstra (Intel)
---
New patch for V4
arch/x86/include/asm/div64.h | 13 +
1
From: Kan Liang
The bit 48 in the PERF_GLOBAL_STATUS is used to indicate the overflow
status of PERF_METRICS counters now.
Move BTS index to 47.
Signed-off-by: Kan Liang
---
New patch for V4
arch/x86/include/asm/perf_event.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff
From: Kan Liang
Icelake has support for measuring the level 1 TopDown metrics
directly in hardware. This is implemented by an additional METRICS
register, and a new Fixed Counter 3 that measures pipeline SLOTS.
For the Ice Lake implementation of performance metrics, software
should start both
From: Kan Liang
The RDPMC index is always re-calculated in RDPMC userspace support,
especially for fixed counters.
The RDPMC index value is stored in variable event_base_rdpmc for kernel
usage, which can be used for RDPMC userspace support as well. Only the
metrics event has to be specially
From: Kan Liang
The fourth fixed counter, TOPDOWN.SLOTS, is introduced in Ice Lake.
Add MSR address and macros for the new fixed counter, which will be used
in the following patch.
Signed-off-by: Kan Liang
---
New patch for V4
arch/x86/include/asm/perf_event.h | 9 +++--
1 file changed
From: Kan Liang
Bit 15 of PERF_CAPABILITIES MSR indicates this architecture provides
built in support for perf METRICS. The perf METRICS is not a PEBS
feature.
Rename pebs_metrics_available perf_metrics.
No one use the bit in current code. The following patch will use it.
Signed-off-by: Kan
From: Kan Liang
Metrics counters (hardware counters containing multiple metrics)
are modeled as separate registers for each TopDown metric events,
with an extra reg being used for coordinating access to the
underlying register in the scheduler.
Adds the basic infrastructure to separate
From: Kan Liang
With Icelake CPUs, the TopDown metrics are directly available as fixed
counters and do not require generic counters, which make it possible to
measure TopDown per thread/process instead of only per core.
The metrics and slots values have to be saved/restored during context
From: Kan Liang
Export new TopDown metrics events for perf that map to the sub metrics
in the metrics register, and another for the new slots fixed counter.
This makes the new fixed counters in Icelake visible to the perf
user tools.
Originally-by: Andi Kleen
Signed-off-by: Kan Liang
From: Kan Liang
The slots event supports sampling. Users may sampling read slots and
metrics events, e.g perf record -e '{slots, topdown-retiring}:S'.
But the metrics event will reset the fixed counter 3 which will impact
the sampling of the slots event.
Add specific validate_group() support
From: Kan Liang
Metrics counters (hardware counters containing multiple metrics)
are modeled as separate registers for each TopDown metric events,
with an extra reg being used for coordinating access to the
underlying register in the scheduler.
Adds the basic infrastructure to separate
From: Andi Kleen
Add some documentation how to use the topdown metrics in ring 3.
Signed-off-by: Andi Kleen
Signed-off-by: Kan Liang
---
tools/perf/Documentation/topdown.txt | 223 +++
1 file changed, 223 insertions(+)
create mode 100644 tools/perf/Documentation
From: Kan Liang
Icelake has support for measuring the level 1 TopDown metrics
directly in hardware. This is implemented by an additional METRICS
register, and a new Fixed Counter 3 that measures pipeline SLOTS.
Four TopDown metric events as separate perf events, which map to
internal METRICS
--topdown to handle these new metrics and
print them in the same way as the previous TopDown metrics.
The restrictions of only being able to report information per core is
gone.
Signed-off-by: Andi Kleen
Signed-off-by: Kan Liang
---
tools/perf/Documentation/perf-stat.txt | 9 ++-
tools/perf/buil
From: Kan Liang
Intro
=
Icelake has support for measuring the four top level TopDown metrics
directly in hardware. This is implemented by an additional "metrics"
register, and a new Fixed Counter 3 that measures pipeline "slots".
Events
==
We export four metric eve
From: Kan Liang
TOPDOWN.SLOTS(0x0400) is not a generic event. It is only available on
fixed counter3.
Don't extend its mask to generic counters.
Signed-off-by: Kan Liang
---
arch/x86/events/intel/core.c | 6 --
arch/x86/include/asm/perf_event.h | 5 +
2 files changed, 9
From: Kan Liang
perf stat -M metrics relies on weak groups to reject unschedulable
groups and run them as non-groups.
This uses the group validation code in the kernel. Unfortunately
that code doesn't take pinned events, such as the NMI watchdog, into
account. So some groups can pass validation
From: Kan Liang
perf stat -M metrics relies on weak groups to reject unschedulable
groups and run them as non-groups.
This uses the group validation code in the kernel. Unfortunately
that code doesn't take pinned events, such as the NMI watchdog, into
account. So some groups can pass validation
From: Kan Liang
When counting system-wide events and cgroup events simultaneously, the
system-wide events are always scheduled during cgroup switch, which is
wrong and brings extra overhead.
Both system-wide and cgroup are per-cpu. They share the same cpuctx
groups, cpuctx->flexible_gro
Commit-ID: 3d0c3953601d250175c7684ec0d9df612061dae5
Gitweb: https://git.kernel.org/tip/3d0c3953601d250175c7684ec0d9df612061dae5
Author: Kan Liang
AuthorDate: Tue, 23 Jul 2019 13:04:29 -0700
Committer: Ingo Molnar
CommitDate: Thu, 25 Jul 2019 15:41:29 +0200
perf/x86/intel: Fix SLOTS
From: Kan Liang
Icelake has support for measuring the level 1 TopDown metrics
directly in hardware. This is implemented by an additional METRICS
register, and a new Fixed Counter 3 that measures pipeline SLOTS.
Four TopDown metric events as separate perf events, which map to
internal METRICS
601 - 700 of 3488 matches
Mail list logo