[tip: perf/core] perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont

2020-05-19 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/core branch of tip: Commit-ID: 0813c40556fce1eeefb996e020cc5339e0b84137 Gitweb: https://git.kernel.org/tip/0813c40556fce1eeefb996e020cc5339e0b84137 Author:Kan Liang AuthorDate:Fri, 01 May 2020 05:54:42 -07:00 Committer

[tip: perf/core] perf/x86/rapl: Add Ice Lake RAPL support

2020-05-19 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/core branch of tip: Commit-ID: f649fc2eefdef7a67698a3c584222c5c8c5a6785 Gitweb: https://git.kernel.org/tip/f649fc2eefdef7a67698a3c584222c5c8c5a6785 Author:Kan Liang AuthorDate:Thu, 07 May 2020 06:14:18 -07:00 Committer

[PATCH] perf/x86/rapl: Add Ice Lake RAPL support

2020-05-07 Thread kan . liang
From: Kan Liang Enable RAPL support for Intel Ice Lake X and Ice Lake D. For RAPL support, it is identical to Sky Lake X. Reported-by: Stephane Eranian Signed-off-by: Kan Liang --- arch/x86/events/intel/rapl.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/events/intel

[RESEND PATCH] perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont

2020-05-01 Thread kan . liang
From: Kan Liang The mask in the extra_regs for Intel Tremont need to be extended to allow more defined bits. "Outstanding Requests" (bit 63) is only available on MSR_OFFCORE_RSP0; Fixes: 6daeb8737f8a ("perf/x86/intel: Add Tremont core PMU support") Reported-by: Stephane

[PATCH V3 02/13] perf/x86/intel: Output LBR TOS information

2019-10-22 Thread kan . liang
From: Kan Liang A new branch sample type was introduced to require the LBR Top-of-Stack (TOS) information. For non-adaptive PEBS and non-PEBS, the TOS information can be directly retrieved from TOS MSR read in intel_pmu_lbr_read(). For adaptive PEBS, the LBR information stored in PEBS record

[PATCH V3 06/13] perf header: Support CPU PMU capabilities

2019-10-22 Thread kan . liang
From: Kan Liang To stitch LBR call stack, the max LBR information is required. So the CPU PMU capabilities information has to be stored in perf header. Add a new feature HEADER_CPU_PMU_CAPS for CPU PMU capabilities. Retrieve all CPU PMU capabilities, not just max LBR information. Add variable

[PATCH V3 07/13] perf machine: Refine the function for LBR call stack reconstruction

2019-10-22 Thread kan . liang
From: Kan Liang LBR only collect the user call stack. To reconstruct a call stack, both kernel call stack and user call stack are required. The function resolve_lbr_callchain_sample() mix the kernel call stack and user call stack. Now, with the help of TOS, perf tool can reconstruct a more

[PATCH V3 09/13] perf report: Add option to enable the LBR stitching approach

2019-10-22 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH V3 10/13] perf script: Add option to enable the LBR stitching approach

2019-10-22 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH V3 04/13] perf header: Add check for event attr

2019-10-22 Thread kan . liang
From: Kan Liang The perf.data may be generated by a newer version of perf tool, which support new input bits in attr, e.g. new bit for branch_sample_type. The perf.data may be parsed by an older version of perf tool later. The old perf tool may parse the perf.data incorrectly

[PATCH V3 11/13] perf top: Add option to enable the LBR stitching approach

2019-10-22 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH V3 03/13] perf tools: Support new branch sample type for LBR TOS

2019-10-22 Thread kan . liang
From: Kan Liang Support new branch sample type for LBR TOS. Enable LBR_TOS by default in LBR call stack mode. If kernel doesn't support the sample type, switching it off. Add a new branch options "tos" for the new branch sample type. Set tos to -1ULL if the LBR TOS information is u

[PATCH V3 12/13] perf c2c: Add option to enable the LBR stitching approach

2019-10-22 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH V3 08/13] perf tools: Stitch LBR call stack

2019-10-22 Thread kan . liang
From: Kan Liang In LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. For example, on skylake, the depth of reconstructed LBR call stack is always <= 32. # To display the perf.data header info, please use # --header/--header-o

[PATCH V3 01/13] perf/core: Add new branch sample type for LBR TOS

2019-10-22 Thread kan . liang
From: Kan Liang In LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. With LBR Top-of-Stack (TOS) information, perf tool may stitch the stacks of two samples. The reconstructed LBR call stack can break the HW limitation. Add a new branch sample

[PATCH V3 05/13] perf pmu: Add support for PMU capabilities

2019-10-22 Thread kan . liang
From: Kan Liang The PMU capabilities information, which is located at /sys/bus/event_source/devices//caps, is required by perf tool. For example, the max LBR information is required to stitch LBR call stack. Add perf_pmu__caps_parse() to parse the PMU capabilities information. The information

[RFC PATCH V3 13/13] perf hist: Add fast path for duplicate entries check

2019-10-22 Thread kan . liang
From: Kan Liang Perf checks the duplicate entries in a callchain before adding an entry. However the check is very slow especially with deeper call stack. Almost ~50% elapsed time of perf report is spent on the check when the call stack is always depth of 32. The hist_entry__cmp() is used

[PATCH V3 00/13] Stitch LBR call stack

2019-10-22 Thread kan . liang
From: Kan Liang Changes since V2 - Move tos into struct perf_branch_stack Changes since V1 - Add a new branch sample type for LBR TOS. Drop the sample type in V1. - Add check in perf header to detect unknown input bits in event attr - Save and use the LBR cursor nodes from previous sample

[PATCH V2 03/13] perf tools: Support new branch sample type for LBR TOS

2019-10-21 Thread kan . liang
From: Kan Liang Support new branch sample type for LBR TOS. Enable LBR_TOS by default in LBR call stack mode. If kernel doesn't support the sample type, switching it off. Add a new branch options "tos" for the new branch sample type. Set tos to -1ULL if the LBR TOS information is u

[PATCH V2 00/13] Stitch LBR call stack

2019-10-21 Thread kan . liang
From: Kan Liang Changes since V1 - Add a new branch sample type for LBR TOS. Drop the sample type in V1. - Add check in perf header to detect unknown input bits in event attr - Save and use the LBR cursor nodes from previous sample to avoid duplicate calculation of cursor nodes. - Add fast

[PATCH V2 11/13] perf top: Add option to enable the LBR stitching approach

2019-10-21 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH V2 07/13] perf machine: Refine the function for LBR call stack reconstruction

2019-10-21 Thread kan . liang
From: Kan Liang LBR only collect the user call stack. To reconstruct a call stack, both kernel call stack and user call stack are required. The function resolve_lbr_callchain_sample() mix the kernel call stack and user call stack. Now, with the help of TOS, perf tool can reconstruct a more

[PATCH V2 10/13] perf script: Add option to enable the LBR stitching approach

2019-10-21 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH V2 04/13] perf header: Add check for event attr

2019-10-21 Thread kan . liang
From: Kan Liang The perf.data may be generated by a newer version of perf tool, which support new input bits in attr, e.g. new bit for branch_sample_type. The perf.data may be parsed by an older version of perf tool later. The old perf tool may parse the perf.data incorrectly

[PATCH V2 09/13] perf report: Add option to enable the LBR stitching approach

2019-10-21 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH V2 05/13] perf pmu: Add support for PMU capabilities

2019-10-21 Thread kan . liang
From: Kan Liang The PMU capabilities information, which is located at /sys/bus/event_source/devices//caps, is required by perf tool. For example, the max LBR information is required to stitch LBR call stack. Add perf_pmu__caps_parse() to parse the PMU capabilities information. The information

[PATCH V2 12/13] perf c2c: Add option to enable the LBR stitching approach

2019-10-21 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH V2 08/13] perf tools: Stitch LBR call stack

2019-10-21 Thread kan . liang
From: Kan Liang In LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. For example, on skylake, the depth of reconstructed LBR call stack is always <= 32. # To display the perf.data header info, please use # --header/--header-o

[PATCH V2 06/13] perf header: Support CPU PMU capabilities

2019-10-21 Thread kan . liang
From: Kan Liang To stitch LBR call stack, the max LBR information is required. So the CPU PMU capabilities information has to be stored in perf header. Add a new feature HEADER_CPU_PMU_CAPS for CPU PMU capabilities. Retrieve all CPU PMU capabilities, not just max LBR information. Add variable

[RFC PATCH V2 13/13] perf hist: Add fast path for duplicate entries check

2019-10-21 Thread kan . liang
From: Kan Liang Perf checks the duplicate entries in a callchain before adding an entry. However the check is very slow especially with deeper call stack. Almost ~50% elapsed time of perf report is spent on the check when the call stack is always depth of 32. The hist_entry__cmp() is used

[PATCH V2 01/13] perf/core: Add new branch sample type for LBR TOS

2019-10-21 Thread kan . liang
From: Kan Liang In LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. With LBR Top-of-Stack (TOS) information, perf tool may stitch the stacks of two samples. The reconstructed LBR call stack can break the HW limitation. Add a new branch sample

[PATCH V2 02/13] perf/x86/intel: Output LBR TOS information

2019-10-21 Thread kan . liang
From: Kan Liang A new branch sample type was introduced to require the LBR Top-of-Stack (TOS) information. For non-adaptive PEBS and non-PEBS, the TOS information can be directly retrieved from TOS MSR read in intel_pmu_lbr_read(). For adaptive PEBS, the LBR information stored in PEBS record

[tip: perf/urgent] perf/x86/intel: Add Comet Lake CPU support

2019-10-12 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 9066288b2aab1804dc1eebec6ff88474363b89cb Gitweb: https://git.kernel.org/tip/9066288b2aab1804dc1eebec6ff88474363b89cb Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:03 -07:00 Committer

[tip: perf/urgent] perf/x86/msr: Add Tiger Lake CPU support

2019-10-12 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 0917b95079af82c69d8f5bab301faeebcd2cb3cd Gitweb: https://git.kernel.org/tip/0917b95079af82c69d8f5bab301faeebcd2cb3cd Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:09 -07:00 Committer

[tip: perf/urgent] perf/x86/msr: Add Comet Lake CPU support

2019-10-12 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 9674b1cc0f94c34f76e58c102623a866836f269e Gitweb: https://git.kernel.org/tip/9674b1cc0f94c34f76e58c102623a866836f269e Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:04 -07:00 Committer

[tip: perf/urgent] perf/x86/intel: Add Tiger Lake CPU support

2019-10-12 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 23645a76ba816652d6898def2ee69c6a6250c9b1 Gitweb: https://git.kernel.org/tip/23645a76ba816652d6898def2ee69c6a6250c9b1 Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:08 -07:00 Committer

[tip: perf/urgent] perf/x86/msr: Add new CPU model numbers for Ice Lake

2019-10-12 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 1a5da78d00ce0152994946debd1417513dc35eb3 Gitweb: https://git.kernel.org/tip/1a5da78d00ce0152994946debd1417513dc35eb3 Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:06 -07:00 Committer

[tip: perf/urgent] perf/x86/cstate: Update C-state counters for Ice Lake

2019-10-12 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: f1857a2467755e5faa3c727d7146b6db960abee1 Gitweb: https://git.kernel.org/tip/f1857a2467755e5faa3c727d7146b6db960abee1 Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:07 -07:00 Committer

[tip: perf/urgent] perf/x86/cstate: Add Comet Lake CPU support

2019-10-12 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 1ffa6c04dae39776a3c222bdf88051e394386c01 Gitweb: https://git.kernel.org/tip/1ffa6c04dae39776a3c222bdf88051e394386c01 Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:05 -07:00 Committer

[tip: perf/urgent] perf/x86/cstate: Add Tiger Lake CPU support

2019-10-12 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 52e92f409dede388b7dc3ee13491fbf7a80db935 Gitweb: https://git.kernel.org/tip/52e92f409dede388b7dc3ee13491fbf7a80db935 Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:10 -07:00 Committer

[tip: perf/urgent] perf/x86/cstate: Add Comet Lake CPU support

2019-10-09 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 1d4d9a6e37ebe8e4ffc3abfcdd24988e7f89df4a Gitweb: https://git.kernel.org/tip/1d4d9a6e37ebe8e4ffc3abfcdd24988e7f89df4a Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:05 -07:00 Committer

[tip: perf/urgent] perf/x86/msr: Add Tiger Lake CPU support

2019-10-09 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: b01a8e2edb924feee1b66f74df1198788fc37cca Gitweb: https://git.kernel.org/tip/b01a8e2edb924feee1b66f74df1198788fc37cca Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:09 -07:00 Committer

[tip: perf/urgent] perf/x86/cstate: Add Tiger Lake CPU support

2019-10-09 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 5e715e1121340369e10aa14c6d498a1928c304bb Gitweb: https://git.kernel.org/tip/5e715e1121340369e10aa14c6d498a1928c304bb Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:10 -07:00 Committer

[tip: perf/urgent] perf/x86/msr: Add new CPU model numbers for Ice Lake

2019-10-09 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 4f0ce17d816a53326947b085bd755d8c1b9b05fb Gitweb: https://git.kernel.org/tip/4f0ce17d816a53326947b085bd755d8c1b9b05fb Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:06 -07:00 Committer

[tip: perf/urgent] perf/x86/msr: Add Comet Lake CPU support

2019-10-09 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 0dcbd5393eae6915a85cb0079a90ec3dc89c455f Gitweb: https://git.kernel.org/tip/0dcbd5393eae6915a85cb0079a90ec3dc89c455f Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:04 -07:00 Committer

[tip: perf/urgent] perf/x86/cstate: Update C-state counters for Ice Lake

2019-10-09 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 374f26643b3ce2bfab02053e292f16adf6e57aa1 Gitweb: https://git.kernel.org/tip/374f26643b3ce2bfab02053e292f16adf6e57aa1 Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:07 -07:00 Committer

[tip: perf/urgent] perf/x86/intel: Add Comet Lake CPU support

2019-10-09 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: a53ad0305c1f25edf63db8ae2a9a0289af8d73d4 Gitweb: https://git.kernel.org/tip/a53ad0305c1f25edf63db8ae2a9a0289af8d73d4 Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:03 -07:00 Committer

[tip: perf/urgent] perf/x86/intel: Add Tiger Lake CPU support

2019-10-09 Thread tip-bot2 for Kan Liang
The following commit has been merged into the perf/urgent branch of tip: Commit-ID: 3fefafb17502e2483abe190d11b1778a1f202d70 Gitweb: https://git.kernel.org/tip/3fefafb17502e2483abe190d11b1778a1f202d70 Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:08 -07:00 Committer

[tip: x86/urgent] x86/cpu: Add Comet Lake to the Intel CPU models header

2019-10-08 Thread tip-bot2 for Kan Liang
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 8d7c6ac3b2371eb1cbc9925a88f4d10efff374de Gitweb: https://git.kernel.org/tip/8d7c6ac3b2371eb1cbc9925a88f4d10efff374de Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:02 -07:00 Committer

[tip: x86/urgent] x86/cpu: Add Comet Lake to the Intel CPU models header

2019-10-08 Thread tip-bot2 for Kan Liang
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 8d7c6ac3b2371eb1cbc9925a88f4d10efff374de Gitweb: https://git.kernel.org/tip/8d7c6ac3b2371eb1cbc9925a88f4d10efff374de Author:Kan Liang AuthorDate:Tue, 08 Oct 2019 08:50:02 -07:00 Committer

[PATCH 6/9] perf/x86/cstate: Update C-state counters for Ice Lake

2019-10-08 Thread kan . liang
From: Kan Liang There is no Core C3 C-State counter for Ice Lake. Package C8/C9/C10 C-State counters are added for Ice Lake. Introduce a new event list, icl_cstates, for Ice Lake. Update the comments accordingly. Fixes: f08c47d1f86c ("perf/x86/intel/cstate: Add Icelake support")

[PATCH 9/9] perf/x86/cstate: Add Tiger Lake CPU support

2019-10-08 Thread kan . liang
From: Kan Liang Tiger Lake is the followon to Ice Lake. From the perspective of Intel cstate residency counters, there is nothing changed compared with Ice Lake. Share icl_cstates with Ice Lake. Update the comments for Tiger Lake. The External Design Specification (EDS) is not published yet

[PATCH 7/9] perf/x86/intel: Add Tiger Lake CPU support

2019-10-08 Thread kan . liang
From: Kan Liang Tiger Lake is the followon to Ice Lake. From the perspective of Intel core PMU, there is little changes compared with Ice Lake, e.g. small changes in event list. But it doesn't impact on core PMU functionality. Share the perf code with Ice Lake. The event list patch

[PATCH 1/9] x86/cpu: Add Comet Lake to Intel family

2019-10-08 Thread kan . liang
From: Kan Liang Comet Lake is the new 10th Gen Intel processor. Add CPU model number for Comet Lake to the Intel family list. The CPU model number is not published in SDM yet. It comes from an authoritative internal source. Signed-off-by: Kan Liang Reviewed-by: Tony Luck Cc: Peter Zijlstra

[PATCH 5/9] perf/x86/msr: Add more CPU model number for Ice Lake

2019-10-08 Thread kan . liang
From: Kan Liang PPERF and SMI_COUNT MSRs are also supported by Ice Lake desktop and server. Signed-off-by: Kan Liang --- arch/x86/events/msr.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c index c177bbe..8515512 100644 --- a/arch/x86

[PATCH 2/9] perf/x86/intel: Add Comet Lake CPU support

2019-10-08 Thread kan . liang
From: Kan Liang Comet Lake is the new 10th Gen Intel processor. From the perspective of Intel PMU, there is nothing changed compared with Sky Lake. Share the perf code with Sky Lake. The patch has been tested on real hardware. Signed-off-by: Kan Liang --- arch/x86/events/intel/core.c | 2

[PATCH 3/9] perf/x86/msr: Add Comet Lake CPU support

2019-10-08 Thread kan . liang
From: Kan Liang Comet Lake is the new 10th Gen Intel processor. PPERF and SMI_COUNT MSRs are also supported. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: Kan Liang

[PATCH 4/9] perf/x86/cstate: Add Comet Lake CPU support

2019-10-08 Thread kan . liang
From: Kan Liang Comet Lake is the new 10th Gen Intel processor. From the perspective of Intel cstate residency counters, there is nothing changed compared with Kaby Lake. Share hswult_cstates with Kaby Lake. Update the comments for Comet Lake. Kaby Lake is missed in the comments for some

[PATCH 8/9] perf/x86/msr: Add Tiger Lake CPU support

2019-10-08 Thread kan . liang
From: Kan Liang Tiger Lake is the followon to Ice Lake. PPERF and SMI_COUNT MSRs are also supported. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: Kan Liang --- arch

[PATCH 0/9] perf: Several update for Comet Lake, Ice Lake and Tiger Lake

2019-10-08 Thread kan . liang
From: Kan Liang Comet Lake is the new 10th Gen Intel processor. Add Comet Lake to Intel family. >From the perspective of Intel core PMU, there is nothing changed compared with Sky Lake. Share the perf code with Sky Lake. Add support for perf msr and cstate driver as well. Tiger L

[PATCH 02/10] perf tools: Support PERF_SAMPLE_LBR_TOS

2019-10-07 Thread kan . liang
From: Kan Liang Support new sample type PERF_SAMPLE_LBR_TOS. Enable LBR_TOS by default in LBR call stack mode. If kernel doesn't support the sample type, switching it off. Reviewed-by: Andi Kleen Signed-off-by: Kan Liang --- tools/include/uapi/linux/perf_event.h | 4 +++- tools/perf

[PATCH 03/10] perf pmu: Add support for PMU capabilities

2019-10-07 Thread kan . liang
From: Kan Liang The PMU capabilities information, which is located at /sys/bus/event_source/devices//caps, is required by perf tool. For example, the max LBR information is required to stitch LBR call stack. Add perf_pmu__caps_parse() to parse the PMU capabilities information. The information

[PATCH 05/10] perf machine: Refine the function for LBR call stack reconstruction

2019-10-07 Thread kan . liang
From: Kan Liang LBR only collect the user call stack. To reconstruct a call stack, both kernel call stack and user call stack are required. The function resolve_lbr_callchain_sample() mix the kernel call stack and user call stack. Now, with the help of TOS, perf tool can reconstruct a more

[PATCH 10/10] perf c2c: Add option to enable the LBR stitching approach

2019-10-07 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH 09/10] perf top: Add option to enable the LBR stitching approach

2019-10-07 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH 07/10] perf report: Add option to enable the LBR stitching approach

2019-10-07 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH 08/10] perf script: Add option to enable the LBR stitching approach

2019-10-07 Thread kan . liang
From: Kan Liang With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number

[PATCH 01/10] perf/core, x86: Add PERF_SAMPLE_LBR_TOS

2019-10-07 Thread kan . liang
From: Kan Liang In LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. With LBR Top-of-Stack (TOS) information, perf tool may stitch the stacks of two samples. The reconstructed LBR call stack can break the HW limitation. Add a new sample type

[PATCH 00/10] Stitch LBR call stack

2019-10-07 Thread kan . liang
From: Kan Liang Start from Haswell, Linux perf can utilize the existing Last Branch Record (LBR) facility to record call stack. However, the depth of the reconstructed LBR call stack limits to the number of LBR registers. E.g. on skylake, the depth of reconstructed LBR call stack is <= 32 Tha

[PATCH 06/10] perf tools: Stitch LBR call stack

2019-10-07 Thread kan . liang
From: Kan Liang In LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. For example, on skylake, the depth of reconstructed LBR call stack is always <= 32. # To display the perf.data header info, please use # --header/--header-o

[PATCH 04/10] perf header: Support CPU PMU capabilities

2019-10-07 Thread kan . liang
From: Kan Liang To stitch LBR call stack, the max LBR information is required. So the CPU PMU capabilities information has to be stored in perf header. Add a new feature HEADER_CPU_PMU_CAPS for CPU PMU capabilities. Retrieve all CPU PMU capabilities, not just max LBR information. Add variable

[PATCH V4 14/14] perf, tools: Add documentation for topdown metrics

2019-09-16 Thread kan . liang
From: Andi Kleen Add some documentation how to use the topdown metrics in ring 3. Signed-off-by: Andi Kleen Signed-off-by: Kan Liang --- No changes since V3 tools/perf/Documentation/topdown.txt | 223 +++ 1 file changed, 223 insertions(+) create mode 100644 tools

[PATCH V4 13/14] perf, tools, stat: Support new per thread TopDown metrics

2019-09-16 Thread kan . liang
--topdown to handle these new metrics and print them in the same way as the previous TopDown metrics. The restrictions of only being able to report information per core is gone. Signed-off-by: Andi Kleen Signed-off-by: Kan Liang --- No changes since V3 tools/perf/Documentation/perf-sta

[PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics

2019-09-16 Thread kan . liang
From: Kan Liang Intro = Icelake has support for measuring the four top level TopDown metrics directly in hardware. This is implemented by an additional "metrics" register, and a new Fixed Counter 3 that measures pipeline "slots". Events == We export four metric eve

[PATCH V4 02/14] perf/x86/intel: Set correct mask for TOPDOWN.SLOTS

2019-09-16 Thread kan . liang
From: Kan Liang TOPDOWN.SLOTS(0x0400) is not a generic event. It is only available on fixed counter3. Don't extend its mask to generic counters. Signed-off-by: Kan Liang --- Changes since V3: - Separate fixed counter3 definition patch arch/x86/events/intel/core.c | 6 -- 1 file changed

[PATCH V4 11/14] perf/x86/intel: Name global status bit in NMI handler

2019-09-16 Thread kan . liang
From: Kan Liang The bit index number of global status is directly used in current NMI handler. Using a meaningful name to replace the number to improve the readability of code. Signed-off-by: Kan Liang --- New patch for V4 arch/x86/events/intel/core.c | 6 +++--- arch/x86/include/asm

[PATCH V4 09/14] perf/x86/intel: Export TopDown events for Icelake

2019-09-16 Thread kan . liang
From: Kan Liang Export new TopDown metrics events for perf that map to the sub metrics in the metrics register, and another for the new slots fixed counter. This makes the new fixed counters in Icelake visible to the perf user tools. Originally-by: Andi Kleen Signed-off-by: Kan Liang

[PATCH V4 08/14] perf/x86/intel: Support per thread RDPMC TopDown metrics

2019-09-16 Thread kan . liang
From: Kan Liang With Icelake CPUs, the TopDown metrics are directly available as fixed counters and do not require generic counters, which make it possible to measure TopDown per thread/process instead of only per core. The metrics and slots values have to be saved/restored during context

[PATCH V4 10/14] perf/x86/intel: Disable sampling read slots and topdown

2019-09-16 Thread kan . liang
From: Kan Liang The slots event supports sampling. Users may sampling read slots and metrics events, e.g perf record -e '{slots, topdown-retiring}:S'. But the metrics event will reset the fixed counter 3 which will impact the sampling of the slots event. Add specific validate_group() support

[PATCH V4 06/14] x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64

2019-09-16 Thread kan . liang
From: "Peter Zijlstra (Intel)" On x86_64 we can do a u64 * u64 -> u128 widening multiply followed by a u128 / u64 -> u64 division to implement a sane version of mul_u64_u32_div(). Signed-off-by: Peter Zijlstra (Intel) --- New patch for V4 arch/x86/include/asm/div64.h | 13 + 1

[PATCH V4 03/14] perf/x86/intel: Move BTS index to 47

2019-09-16 Thread kan . liang
From: Kan Liang The bit 48 in the PERF_GLOBAL_STATUS is used to indicate the overflow status of PERF_METRICS counters now. Move BTS index to 47. Signed-off-by: Kan Liang --- New patch for V4 arch/x86/include/asm/perf_event.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff

[PATCH V4 00/14] TopDown metrics support for Icelake

2019-09-16 Thread kan . liang
From: Kan Liang Icelake has support for measuring the level 1 TopDown metrics directly in hardware. This is implemented by an additional METRICS register, and a new Fixed Counter 3 that measures pipeline SLOTS. For the Ice Lake implementation of performance metrics, software should start both

[PATCH V4 12/14] perf/x86: Use event_base_rdpmc for RDPMC userspace support

2019-09-16 Thread kan . liang
From: Kan Liang The RDPMC index is always re-calculated in RDPMC userspace support, especially for fixed counters. The RDPMC index value is stored in variable event_base_rdpmc for kernel usage, which can be used for RDPMC userspace support as well. Only the metrics event has to be specially

[PATCH V4 01/14] perf/x86/intel: Introduce the fourth fixed counter

2019-09-16 Thread kan . liang
From: Kan Liang The fourth fixed counter, TOPDOWN.SLOTS, is introduced in Ice Lake. Add MSR address and macros for the new fixed counter, which will be used in the following patch. Signed-off-by: Kan Liang --- New patch for V4 arch/x86/include/asm/perf_event.h | 9 +++-- 1 file changed

[PATCH V4 05/14] perf/x86/intel: Fix the name of perf capabilities for perf METRICS

2019-09-16 Thread kan . liang
From: Kan Liang Bit 15 of PERF_CAPABILITIES MSR indicates this architecture provides built in support for perf METRICS. The perf METRICS is not a PEBS feature. Rename pebs_metrics_available perf_metrics. No one use the bit in current code. The following patch will use it. Signed-off-by: Kan

[PATCH V4 04/14] perf/x86/intel: Basic support for metrics counters

2019-09-16 Thread kan . liang
From: Kan Liang Metrics counters (hardware counters containing multiple metrics) are modeled as separate registers for each TopDown metric events, with an extra reg being used for coordinating access to the underlying register in the scheduler. Adds the basic infrastructure to separate

[RESEND PATCH V3 4/8] perf/x86/intel: Support per thread RDPMC TopDown metrics

2019-08-26 Thread kan . liang
From: Kan Liang With Icelake CPUs, the TopDown metrics are directly available as fixed counters and do not require generic counters, which make it possible to measure TopDown per thread/process instead of only per core. The metrics and slots values have to be saved/restored during context

[RESEND PATCH V3 5/8] perf/x86/intel: Export TopDown events for Icelake

2019-08-26 Thread kan . liang
From: Kan Liang Export new TopDown metrics events for perf that map to the sub metrics in the metrics register, and another for the new slots fixed counter. This makes the new fixed counters in Icelake visible to the perf user tools. Originally-by: Andi Kleen Signed-off-by: Kan Liang

[RESEND PATCH V3 6/8] perf/x86/intel: Disable sampling read slots and topdown

2019-08-26 Thread kan . liang
From: Kan Liang The slots event supports sampling. Users may sampling read slots and metrics events, e.g perf record -e '{slots, topdown-retiring}:S'. But the metrics event will reset the fixed counter 3 which will impact the sampling of the slots event. Add specific validate_group() support

[RESEND PATCH V3 2/8] perf/x86/intel: Basic support for metrics counters

2019-08-26 Thread kan . liang
From: Kan Liang Metrics counters (hardware counters containing multiple metrics) are modeled as separate registers for each TopDown metric events, with an extra reg being used for coordinating access to the underlying register in the scheduler. Adds the basic infrastructure to separate

[RESEND PATCH V3 8/8] perf, tools: Add documentation for topdown metrics

2019-08-26 Thread kan . liang
From: Andi Kleen Add some documentation how to use the topdown metrics in ring 3. Signed-off-by: Andi Kleen Signed-off-by: Kan Liang --- tools/perf/Documentation/topdown.txt | 223 +++ 1 file changed, 223 insertions(+) create mode 100644 tools/perf/Documentation

[RESEND PATCH V3 0/8] TopDown metrics support for Icelake

2019-08-26 Thread kan . liang
From: Kan Liang Icelake has support for measuring the level 1 TopDown metrics directly in hardware. This is implemented by an additional METRICS register, and a new Fixed Counter 3 that measures pipeline SLOTS. Four TopDown metric events as separate perf events, which map to internal METRICS

[RESEND PATCH V3 7/8] perf, tools, stat: Support new per thread TopDown metrics

2019-08-26 Thread kan . liang
--topdown to handle these new metrics and print them in the same way as the previous TopDown metrics. The restrictions of only being able to report information per core is gone. Signed-off-by: Andi Kleen Signed-off-by: Kan Liang --- tools/perf/Documentation/perf-stat.txt | 9 ++- tools/perf/buil

[RESEND PATCH V3 3/8] perf/x86/intel: Support hardware TopDown metrics

2019-08-26 Thread kan . liang
From: Kan Liang Intro = Icelake has support for measuring the four top level TopDown metrics directly in hardware. This is implemented by an additional "metrics" register, and a new Fixed Counter 3 that measures pipeline "slots". Events == We export four metric eve

[RESEND PATCH V3 1/8] perf/x86/intel: Set correct mask for TOPDOWN.SLOTS

2019-08-26 Thread kan . liang
From: Kan Liang TOPDOWN.SLOTS(0x0400) is not a generic event. It is only available on fixed counter3. Don't extend its mask to generic counters. Signed-off-by: Kan Liang --- arch/x86/events/intel/core.c | 6 -- arch/x86/include/asm/perf_event.h | 5 + 2 files changed, 9

[PATCH V2] perf/x86: Consider pinned events for group validation

2019-08-22 Thread kan . liang
From: Kan Liang perf stat -M metrics relies on weak groups to reject unschedulable groups and run them as non-groups. This uses the group validation code in the kernel. Unfortunately that code doesn't take pinned events, such as the NMI watchdog, into account. So some groups can pass validation

[PATCH] perf/x86: Consider pinned events for group validation

2019-08-16 Thread kan . liang
From: Kan Liang perf stat -M metrics relies on weak groups to reject unschedulable groups and run them as non-groups. This uses the group validation code in the kernel. Unfortunately that code doesn't take pinned events, such as the NMI watchdog, into account. So some groups can pass validation

[PATCH V3] perf/cgroup: Do not switch system-wide events in cgroup switch

2019-08-07 Thread kan . liang
From: Kan Liang When counting system-wide events and cgroup events simultaneously, the system-wide events are always scheduled during cgroup switch, which is wrong and brings extra overhead. Both system-wide and cgroup are per-cpu. They share the same cpuctx groups, cpuctx->flexible_gro

[tip:perf/urgent] perf/x86/intel: Fix SLOTS PEBS event constraint

2019-07-25 Thread tip-bot for Kan Liang
Commit-ID: 3d0c3953601d250175c7684ec0d9df612061dae5 Gitweb: https://git.kernel.org/tip/3d0c3953601d250175c7684ec0d9df612061dae5 Author: Kan Liang AuthorDate: Tue, 23 Jul 2019 13:04:29 -0700 Committer: Ingo Molnar CommitDate: Thu, 25 Jul 2019 15:41:29 +0200 perf/x86/intel: Fix SLOTS

[PATCH V3 0/8] TopDown metrics support for Icelake

2019-07-24 Thread kan . liang
From: Kan Liang Icelake has support for measuring the level 1 TopDown metrics directly in hardware. This is implemented by an additional METRICS register, and a new Fixed Counter 3 that measures pipeline SLOTS. Four TopDown metric events as separate perf events, which map to internal METRICS

<    2   3   4   5   6   7   8   9   10   11   >