Re: [RFC PATCH 0/3]perf/core: extend perf_reg and perf_sample_regs_intr
Hi, On Wed, Nov 4, 2015 at 9:46 PM, Madhavan Srinivasan <ma...@linux.vnet.ibm.com> wrote: > > This patchset extend the perf sample regs infrastructure > to include architecture specific regs. Current perf_sample_regs_intr > exports only registers in the pt_regs to perf.data using > PERF_SAMPLE_REGS_INTR sample type. But sometimes we end up looking > for or prefer raw register values at the interrupt during debug. > I don't quite understand this explanation here. What do you mean by raw register values? The content of pt_regs is also raw register values at interrupt. The API does not say that the content of perf_sample_regs_intr can ONLY contain names of registers coming from pt_regs. The meaning of each bit is arch specific and you are free to map to whatever is relevant for your arch. All the API says is that the values captured in the sampling buffer for these registers are taken at PMU interrupt. > > This patchset extends the perf_regs to include arch specific struct, > and makes perf_sample_regs_intr in kernel/event/core.c as __weak__ > function. This way, an arch specific implementation of > perf_sample_regs_intr() can update the arch specific data to > the perf_regs. > > First patch defines a new structure arch_misc_reg and updates the same > in the struct perf_regs. Patch also modifies the perf_reg_value() > and perf_output_sample_regs() to take perf_regs parameter instead of pt_regs. > > Second patch updates struct arch_misc_reg for arch/powerpc with pmu registers > and adds offsetof macro for the same. It extends perf_reg_value() > to use reg idx to decide on struct to return value from. > > Third patch adds arch specific perf_sample_regs_intr() in arch/powerpc > to hook up the arch_misc_regs to perf_regs. > > This patchset depends on the recent posting by Anju T in > linuxppc-dev@lists.ozlabs.org to enable PERF_SAMPLE_REGS_INTR > support in arch/powerpc. > > https://patchwork.ozlabs.org/patch/539242/ > https://patchwork.ozlabs.org/patch/539243/ > https://patchwork.ozlabs.org/patch/539244/ > > Would appreciate comments and feedback. > > Signed-off-by: Madhavan Srinivasan <ma...@linux.vnet.ibm.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Ingo Molnar <mi...@kernel.org> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Jiri Olsa <jo...@kernel.org> > Cc: Arnaldo Carvalho de Melo <a...@kernel.org> > Cc: Stephane Eranian <eran...@gmail.com> > Cc: Russell King <li...@arm.linux.org.uk> > Cc: Catalin Marinas <catalin.mari...@arm.com> > Cc: Will Deacon <will.dea...@arm.com> > Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> > Cc: Michael Ellerman <m...@ellerman.id.au> > Cc: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com> > > Madhavan Srinivasan (3): > perf/core: extend perf_regs to include arch specific regs > perf/powerpc: update macros and add regs to arch_misc_reg struct > perf/powerpc: Functions to update arch_misc_regs > > arch/arm/include/asm/ptrace.h | 2 ++ > arch/arm/kernel/perf_regs.c | 4 +++- > arch/arm64/include/asm/ptrace.h | 2 ++ > arch/arm64/kernel/perf_regs.c | 4 +++- > arch/powerpc/include/uapi/asm/perf_regs.h | 10 ++ > arch/powerpc/include/uapi/asm/ptrace.h | 11 +++ > arch/powerpc/perf/core-book3s.c | 29 > + > arch/powerpc/perf/perf_regs.c | 28 ++-- > arch/x86/include/asm/ptrace.h | 2 ++ > arch/x86/kernel/perf_regs.c | 4 +++- > include/linux/perf_regs.h | 5 +++-- > kernel/events/core.c| 8 > tools/perf/arch/powerpc/include/perf_regs.h | 16 > 13 files changed, 114 insertions(+), 11 deletions(-) > > -- > 1.9.1 > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [V6 00/11] perf: New conditional branch filter
On Wed, May 28, 2014 at 10:04 AM, Anshuman Khandual khand...@linux.vnet.ibm.com wrote: On 05/27/2014 05:39 PM, Stephane Eranian wrote: I have been looking at those patches and ran some tests. And I found a few issues so far. I am running: $ perf record -j any_ret -e cycles:u test_program $ perf report -D Most entries are okay and match the filter, however some do not make sense: 3642586996762 0x15d0 [0x108]: PERF_RECORD_SAMPLE(IP, 2): 17921/17921: 0x10001170 period: 613678 addr: 0 branch stack: nr:9 . 0: 100011cc - 1e38 . 1: 10001150 - 100011bc . 2: 10001208 - 1e38 . 3: 10001160 - 100011f8 . 4: 100011cc - 1e38 . 5: 10001150 - 100011bc . 6: 10001208 - 1e38 . 7: 10001160 - 100011f8 . 8: - 10001160 ^^ Entry 8 does not make sense, unless 0x0 is a valid return branch instruction address. If an address is invalid, the whole entry needs to be eliminated. It is okay to have less than the max number of entries supported by HW. Hey Stephane, Okay. The same behaviour is also reflected in the test results what I have shared in the patchset. Here is that section. (3) perf record -j any_ret -e branch-misses:u ./cprog # Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol # ... . . # 15.61%cprog [unknown] [.] cprog [.] sw_3_1 6.28%cprog cprog [.] symbol2cprog [.] hw_1_2 6.28%cprog cprog [.] ctr_addr cprog [.] sw_4_1 6.26%cprog cprog [.] success_3_1_3 cprog [.] sw_3_1 6.24%cprog cprog [.] symbol1cprog [.] hw_1_1 6.24%cprog cprog [.] sw_4_2 cprog [.] callme 6.21%cprog [unknown] [.] cprog [.] callme 6.19%cprog cprog [.] lr_addrcprog [.] sw_4_2 3.16%cprog cprog [.] hw_1_2 cprog [.] callme 3.15%cprog cprog [.] success_3_1_1 cprog [.] sw_3_1 3.15%cprog cprog [.] sw_4_1 cprog [.] callme 3.14%cprog cprog [.] callme cprog [.] main 3.13%cprog cprog [.] hw_1_1 cprog [.] callme So a lot of samples above have 0x0 as the from address. This originates from the code section here inside the function power_pmu_bhrb_read, where we hit two back to back Could you explain the back-to-back case a bit more here? Back-to-back returns to me means something like: int foo() { ... return bar(); } int bar() { return 0; } Not counting the leaf optimization here, bar return to foo which immediately returns: 2 back-2-back returns. Is that the case you're talking about here? target addresses. So we zero out the from address for the first target address and re-read the second address over again. So thats how we get zero as the from address. This is how the HW capture the samples. I was reluctant to drop these samples but I agree that these kind of samples can be dropped if we need to. I think we need to make it as simple as possible for tools, i.e., avoid having to decode the disassembly to figure out what happened. Here address 0 is not exploitable. if (val BHRB_TARGET) { /* Shouldn't have two targets in a row.. Reset index and try again */ r_index--; addr = 0; } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [V6 00/11] perf: New conditional branch filter
On Mon, Jun 2, 2014 at 6:04 PM, Anshuman Khandual khand...@linux.vnet.ibm.com wrote: On 06/02/2014 06:29 PM, Stephane Eranian wrote: On Wed, May 28, 2014 at 10:04 AM, Anshuman Khandual khand...@linux.vnet.ibm.com wrote: On 05/27/2014 05:39 PM, Stephane Eranian wrote: I have been looking at those patches and ran some tests. And I found a few issues so far. I am running: $ perf record -j any_ret -e cycles:u test_program $ perf report -D Most entries are okay and match the filter, however some do not make sense: 3642586996762 0x15d0 [0x108]: PERF_RECORD_SAMPLE(IP, 2): 17921/17921: 0x10001170 period: 613678 addr: 0 branch stack: nr:9 . 0: 100011cc - 1e38 . 1: 10001150 - 100011bc . 2: 10001208 - 1e38 . 3: 10001160 - 100011f8 . 4: 100011cc - 1e38 . 5: 10001150 - 100011bc . 6: 10001208 - 1e38 . 7: 10001160 - 100011f8 . 8: - 10001160 ^^ Entry 8 does not make sense, unless 0x0 is a valid return branch instruction address. If an address is invalid, the whole entry needs to be eliminated. It is okay to have less than the max number of entries supported by HW. Hey Stephane, Okay. The same behaviour is also reflected in the test results what I have shared in the patchset. Here is that section. (3) perf record -j any_ret -e branch-misses:u ./cprog # Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol # ... . . # 15.61%cprog [unknown] [.] cprog [.] sw_3_1 6.28%cprog cprog [.] symbol2cprog [.] hw_1_2 6.28%cprog cprog [.] ctr_addr cprog [.] sw_4_1 6.26%cprog cprog [.] success_3_1_3 cprog [.] sw_3_1 6.24%cprog cprog [.] symbol1cprog [.] hw_1_1 6.24%cprog cprog [.] sw_4_2 cprog [.] callme 6.21%cprog [unknown] [.] cprog [.] callme 6.19%cprog cprog [.] lr_addrcprog [.] sw_4_2 3.16%cprog cprog [.] hw_1_2 cprog [.] callme 3.15%cprog cprog [.] success_3_1_1 cprog [.] sw_3_1 3.15%cprog cprog [.] sw_4_1 cprog [.] callme 3.14%cprog cprog [.] callme cprog [.] main 3.13%cprog cprog [.] hw_1_1 cprog [.] callme So a lot of samples above have 0x0 as the from address. This originates from the code section here inside the function power_pmu_bhrb_read, where we hit two back to back Could you explain the back-to-back case a bit more here? Back-to-back returns to me means something like: int foo() { ... return bar(); } int bar() { return 0; } Not counting the leaf optimization here, bar return to foo which immediately returns: 2 back-2-back returns. Is that the case you're talking about here? No. Filtering of return branches has been implemented in SW only. So PMU as such does not capture return only branches. It captures all the branches what it encounters. During the capture process PMU might *record* two back to back target addresses (without capturing the from address for the first one) for which we are unable to figure out the from address. This leaves us with one branch record where we have the target address not from address and so we make it zero. With the current logic all branch records with from address as zero get filtered through and become the part of the final set. I was not too sure how to deal with these cases. So PPC8 captures all branches, no HW filter. But then in SW you filter out non return branches. Given you're description, I have to believe that sometimes the HW does not even capture the from address. If so, then in that case, I think it is best to drop the sample. Because the target address may be the target of an indirect branch for which there is no way to find the source. In other words, the record cannot be exploited. But why does the HW not capture some from addresses? I am worried this might create some bias in the samples. target addresses. So we zero out the from address for the first target address and re-read the second address over again. So thats how we get zero as the from address. This is how the HW capture
Re: [V6 00/11] perf: New conditional branch filter
Hi, On Mon, May 5, 2014 at 11:09 AM, Anshuman Khandual khand...@linux.vnet.ibm.com wrote: This patchset is the re-spin of the original branch stack sampling patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset also enables SW based branch filtering support for book3s powerpc platforms which have PMU HW backed branch stack sampling support. Summary of code changes in this patchset: (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter (2) Add the cond branch filter options in the perf record tool (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform (5) Update the documentation regarding perf record tool (6) Add some new powerpc instruction analysis functions in code-patching library (7) Enable SW based branch filter support for powerpc book3s (8) Changed BHRB configuration in POWER8 to accommodate SW branch filters I have been looking at those patches and ran some tests. And I found a few issues so far. I am running: $ perf record -j any_ret -e cycles:u test_program $ perf report -D Most entries are okay and match the filter, however some do not make sense: 3642586996762 0x15d0 [0x108]: PERF_RECORD_SAMPLE(IP, 2): 17921/17921: 0x10001170 period: 613678 addr: 0 branch stack: nr:9 . 0: 100011cc - 1e38 . 1: 10001150 - 100011bc . 2: 10001208 - 1e38 . 3: 10001160 - 100011f8 . 4: 100011cc - 1e38 . 5: 10001150 - 100011bc . 6: 10001208 - 1e38 . 7: 10001160 - 100011f8 . 8: - 10001160 ^^ Entry 8 does not make sense, unless 0x0 is a valid return branch instruction address. If an address is invalid, the whole entry needs to be eliminated. It is okay to have less than the max number of entries supported by HW. I also had cases where monitoring only at the user level, got me branch addresses in the 0xc000.. range. My test program is linked statically. when eliminating the bogus entries, my tests yielded only return branch instruction addresses which is good. Will run more tests. With this new SW enablement, the branch filter support for book3s platforms have been extended to include all these combinations discussed below with a sample test application program (included here). Changes in V2 = (1) Enabled PPC64 SW branch filtering support (2) Incorporated changes required for all previous comments Changes in V3 = (1) Split the SW branch filter enablement into multiple patches (2) Added PMU neutral SW branch filtering code, PMU specific HW branch filtering code (3) Added new instruction analysis functionality into powerpc code-patching library (4) Changed name for some of the functions (5) Fixed couple of spelling mistakes (6) Changed code documentation in multiple places Changes in V4 = (1) Changed the commit message for patch (01/10) (2) Changed the patch (02/10) to accommodate review comments from Michael Ellerman (3) Rebased the patchset against latest Linus's tree Changes in V5 = (1) Added a precursor patch to cleanup the indentation problem in power_pmu_bhrb_read (2) Added a precursor patch to re-arrange P8 PMU BHRB filter config which improved the clarity (3) Merged the previous 10th patch into the 8th patch (4) Moved SW based branch analysis code from core perf into code-patching library as suggested by Michael (5) Simplified the logic in branch analysis library (6) Fixed some ambiguities in documentation at various places (7) Added some more in-code documentation blocks at various places (8) Renamed some local variable and function names (9) Fixed some indentation and white space errors in the code (10) Implemented almost all the review comments and suggestions made by Michael Ellerman on V4 patchset (11) Enabled privilege mode SW branch filter (12) Simplified and generalized the SW implemented conditional branch filter (13) PERF_SAMPLE_BRANCH_COND filter is now supported only through SW implementation (14) Adjusted other patches to deal with the above changes Changes in V6 = (1) Rebased the patchset against the master (2) Added Reviewed-by: Andi Kleen in the first four patches in the series which changes the generic or X86 perf code. [https://lkml.org/lkml/2014/4/7/130] HW implemented branch filters = (1) perf record -j any_call -e branch-misses:u ./cprog # Overhead Command Source Shared ObjectSource Symbol Target Shared Object Target Symbol # ... ... # 7.85%cprog cprog [.] sw_3_1 cprog [.]
Re: [PATCH 1/5] perf: New conditional branch filter criteria in branch stack sampling
On Wed, May 22, 2013 at 8:22 AM, Anshuman Khandual khand...@linux.vnet.ibm.com wrote: POWER8 PMU based BHRB supports filtering for conditional branches. This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which will extend the existing perf ABI. Other architectures can provide this functionality with either HW filtering support (if present) or with SW filtering of instructions. Reviewed-by: Stephane Eranian eran...@google.com Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com --- include/uapi/linux/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index fb104e5..cb0de86 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -157,8 +157,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ANY_CALL = 1U 4, /* any call branch */ PERF_SAMPLE_BRANCH_ANY_RETURN = 1U 5, /* any return branch */ PERF_SAMPLE_BRANCH_IND_CALL = 1U 6, /* indirect calls */ + PERF_SAMPLE_BRANCH_COND = 1U 7, /* conditional branches */ - PERF_SAMPLE_BRANCH_MAX = 1U 7, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U 8, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/5] x86, perf: Add conditional branch filtering support
On Wed, May 22, 2013 at 8:22 AM, Anshuman Khandual khand...@linux.vnet.ibm.com wrote: From: Peter Zijlstra a.p.zijls...@chello.nl This patch adds conditional branch filtering support, enabling it for PERF_SAMPLE_BRANCH_COND in perf branch stack sampling framework by utilizing an available software filter X86_BR_JCC. Reviewed-by: Stephane Eranian eran...@google.com Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com --- arch/x86/kernel/cpu/perf_event_intel_lbr.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c index d978353..a0d6387 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -337,6 +337,10 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event) if (br_type PERF_SAMPLE_BRANCH_IND_CALL) mask |= X86_BR_IND_CALL; + + if (br_type PERF_SAMPLE_BRANCH_COND) + mask |= X86_BR_JCC; + /* * stash actual user request into reg, it may * be used by fixup code for some CPU @@ -626,6 +630,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { * NHM/WSM erratum: must include IND_JMP to capture IND_CALL */ [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { @@ -637,6 +642,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { [PERF_SAMPLE_BRANCH_ANY_CALL] = LBR_REL_CALL | LBR_IND_CALL | LBR_FAR, [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; /* core */ -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/5] perf, documentation: Description for conditional branch filter
On Wed, May 22, 2013 at 8:22 AM, Anshuman Khandual khand...@linux.vnet.ibm.com wrote: Adding documentation support for conditional branch filter. Reviewed-by: Stephane Eranian eran...@google.com Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com --- tools/perf/Documentation/perf-record.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index d4da111..8b5e1ed 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -169,12 +169,13 @@ following filters are defined: - any_call: any function call or system call - any_ret: any function return or system call return - ind_call: any indirect branch +- cond: conditional branches - u: only when the branch target is at the user level - k: only when the branch target is in the kernel - hv: only when the target is at the hypervisor level + -The option requires at least one branch type among any, any_call, any_ret, ind_call. +The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. The privilege levels may be omitted, in which case, the privilege levels of the associated event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege levels are subject to permissions. When sampling on multiple events, branch stack sampling -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] perf, x86, lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
Hi, On Wed, May 22, 2013 at 8:43 AM, Anshuman Khandual khand...@linux.vnet.ibm.com wrote: On 05/21/2013 07:25 PM, Stephane Eranian wrote: On Thu, May 16, 2013 at 12:15 PM, Michael Neuling mi...@neuling.org wrote: Peter Zijlstra pet...@infradead.org wrote: On Wed, May 15, 2013 at 03:37:22PM +0200, Stephane Eranian wrote: On Fri, May 3, 2013 at 2:11 PM, Peter Zijlstra a.p.zijls...@chello.nl wrote: We should always have proper privileges when requesting kernel data. Cc: Andi Kleen a...@linux.intel.com Cc: eran...@google.com Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwr...@git.kernel.org --- arch/x86/kernel/cpu/perf_event_intel_lbr.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -318,8 +318,11 @@ static void intel_pmu_setup_sw_lbr_filte if (br_type PERF_SAMPLE_BRANCH_USER) mask |= X86_BR_USER; - if (br_type PERF_SAMPLE_BRANCH_KERNEL) + if (br_type PERF_SAMPLE_BRANCH_KERNEL) { + if (perf_paranoid_kernel() !capable(CAP_SYS_ADMIN)) + return -EACCES; mask |= X86_BR_KERNEL; + } This will prevent regular users from capturing kernel - kernel branches. But it won't prevent users from getting kernel - user branches. Thus some kernel address will still be captured. I guess they could be eliminated by the sw_filter. When using LBR priv level filtering, the filter applies to the branch target only. How about something like the below? It also adds the branch flags Mikey wanted for PowerPC. Peter, BTW PowerPC also has the ability to filter on conditional branches. Any chance we could add something like the follow to perf also? Mikey diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index fb104e5..891c769 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -157,8 +157,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ANY_CALL = 1U 4, /* any call branch */ PERF_SAMPLE_BRANCH_ANY_RETURN = 1U 5, /* any return branch */ PERF_SAMPLE_BRANCH_IND_CALL = 1U 6, /* indirect calls */ + PERF_SAMPLE_BRANCH_CONDITIONAL = 1U 7, /* conditional branches */ I would use PERF_SAMPLE_BRANCH_COND here. - PERF_SAMPLE_BRANCH_MAX = 1U 7, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U 8, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index cdf58ec..5b0b89d 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -676,6 +676,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT(any_call, PERF_SAMPLE_BRANCH_ANY_CALL), BRANCH_OPT(any_ret, PERF_SAMPLE_BRANCH_ANY_RETURN), BRANCH_OPT(ind_call, PERF_SAMPLE_BRANCH_IND_CALL), + BRANCH_OPT(cnd, PERF_SAMPLE_BRANCH_CONDITIONAL), use cond BRANCH_END }; And if you do this, you also need to update the x86 perf_event_intel_lbr.c mapping tables to fill out the entries for PERF_SAMPLE_BRANCH_COND: [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, And you also need to update intel_pmu_setup_sw_lbr_filter() to handle the conversion to x86 instructions: if (br_type PERF_SAMPLE_BRANCH_COND) mask |= X86_BR_JCC; You also need to update the perf-record.txt documentation to list cond as a possible branch filter. Hey Stephane, I have incorporated all the review comments into the patch series https://lkml.org/lkml/2013/5/22/51. I don't see how you can compile Patch 3/5: + BRANCH_OPT(cond, PERF_SAMPLE_BRANCH_CONDITIONAL), Needs to be PERF_SAMPLE_BRANCH_COND. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] perf, x86, lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
On Tue, May 21, 2013 at 10:50 AM, Peter Zijlstra pet...@infradead.org wrote: On Tue, May 21, 2013 at 03:41:35PM +1000, Michael Neuling wrote: Peter Zijlstra pet...@infradead.org wrote: Can we add your signed-off-by on this? We are cleaning up our series for conditional branches and would like to add this as part of the post. Sure, but its completely untested.. I was hoping Stephane would say somnething about it since he wrote all that magic ;-) Let me take a look at it. But yeah, feel free to add my SoB. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] perf, x86, lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
On Thu, May 16, 2013 at 12:15 PM, Michael Neuling mi...@neuling.org wrote: Peter Zijlstra pet...@infradead.org wrote: On Wed, May 15, 2013 at 03:37:22PM +0200, Stephane Eranian wrote: On Fri, May 3, 2013 at 2:11 PM, Peter Zijlstra a.p.zijls...@chello.nl wrote: We should always have proper privileges when requesting kernel data. Cc: Andi Kleen a...@linux.intel.com Cc: eran...@google.com Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwr...@git.kernel.org --- arch/x86/kernel/cpu/perf_event_intel_lbr.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -318,8 +318,11 @@ static void intel_pmu_setup_sw_lbr_filte if (br_type PERF_SAMPLE_BRANCH_USER) mask |= X86_BR_USER; - if (br_type PERF_SAMPLE_BRANCH_KERNEL) + if (br_type PERF_SAMPLE_BRANCH_KERNEL) { + if (perf_paranoid_kernel() !capable(CAP_SYS_ADMIN)) + return -EACCES; mask |= X86_BR_KERNEL; + } This will prevent regular users from capturing kernel - kernel branches. But it won't prevent users from getting kernel - user branches. Thus some kernel address will still be captured. I guess they could be eliminated by the sw_filter. When using LBR priv level filtering, the filter applies to the branch target only. How about something like the below? It also adds the branch flags Mikey wanted for PowerPC. Peter, BTW PowerPC also has the ability to filter on conditional branches. Any chance we could add something like the follow to perf also? Mikey diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index fb104e5..891c769 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -157,8 +157,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ANY_CALL = 1U 4, /* any call branch */ PERF_SAMPLE_BRANCH_ANY_RETURN = 1U 5, /* any return branch */ PERF_SAMPLE_BRANCH_IND_CALL = 1U 6, /* indirect calls */ + PERF_SAMPLE_BRANCH_CONDITIONAL = 1U 7, /* conditional branches */ I would use PERF_SAMPLE_BRANCH_COND here. - PERF_SAMPLE_BRANCH_MAX = 1U 7, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U 8, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index cdf58ec..5b0b89d 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -676,6 +676,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT(any_call, PERF_SAMPLE_BRANCH_ANY_CALL), BRANCH_OPT(any_ret, PERF_SAMPLE_BRANCH_ANY_RETURN), BRANCH_OPT(ind_call, PERF_SAMPLE_BRANCH_IND_CALL), + BRANCH_OPT(cnd, PERF_SAMPLE_BRANCH_CONDITIONAL), use cond BRANCH_END }; And if you do this, you also need to update the x86 perf_event_intel_lbr.c mapping tables to fill out the entries for PERF_SAMPLE_BRANCH_COND: [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, And you also need to update intel_pmu_setup_sw_lbr_filter() to handle the conversion to x86 instructions: if (br_type PERF_SAMPLE_BRANCH_COND) mask |= X86_BR_JCC; You also need to update the perf-record.txt documentation to list cond as a possible branch filter. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] perf, x86, lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
On Fri, May 17, 2013 at 1:39 PM, Peter Zijlstra pet...@infradead.org wrote: On Fri, May 17, 2013 at 09:32:08PM +1000, Michael Neuling wrote: Peter Zijlstra pet...@infradead.org wrote: Wouldn't it be mostly conditional branches that are the primary control flow and can get predicted wrong? I mean, I'm sure someone will miss-predict an unconditional branch but its not like we care about people with such afflictions do we? You could mispredict the target address of a computed goto. You'd know it was taken but not know target address until later in the pipeline. Oh right, computed targets could indeed be mis predicted. I was more thinking about jumps with immediate values. On this, the POWER8 branch history buffer tells us two things about the prediction status. 1) if the branch was predicted taken/not taken correctly 2) if the target address was predicted correctly or not (for computed gotos only) So we'd actually like more prediction bits too :-D So if I understand this right, 1) maps to the predicted flags we have; 2) would be new stuff? We don't really have anything like that on x86, but I suppose if you make the thing optional and present a 'useful' use-case implemented in userspace code we could take it :-) Anyway, since PPC people thought it worth baking into hardware, presumably they have a compelling use case. Mikey could you see if you can retrieve that from someone in the know? It might be interesting. I don't think we can mispredict a non-conditional non-computed but I'll have to check with the HW folks. I was mostly wondering about the use-case for the conditional filter. Stephane didn't think it useful, clearly your hardware guys thought different :-) From my experience talking with compiler people, they care about ALL the branches and not the conditional so much. They use LBR to do basic block profiling. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] perf, x86, lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
On Sat, May 18, 2013 at 12:14 AM, Michael Neuling mi...@neuling.org wrote: Stephane Eranian eran...@google.com wrote: On Fri, May 17, 2013 at 1:39 PM, Peter Zijlstra pet...@infradead.org wrote: On Fri, May 17, 2013 at 09:32:08PM +1000, Michael Neuling wrote: Peter Zijlstra pet...@infradead.org wrote: Wouldn't it be mostly conditional branches that are the primary control flow and can get predicted wrong? I mean, I'm sure someone will miss-predict an unconditional branch but its not like we care about people with such afflictions do we? You could mispredict the target address of a computed goto. You'd know it was taken but not know target address until later in the pipeline. Oh right, computed targets could indeed be mis predicted. I was more thinking about jumps with immediate values. On this, the POWER8 branch history buffer tells us two things about the prediction status. 1) if the branch was predicted taken/not taken correctly 2) if the target address was predicted correctly or not (for computed gotos only) So we'd actually like more prediction bits too :-D So if I understand this right, 1) maps to the predicted flags we have; 2) would be new stuff? We don't really have anything like that on x86, but I suppose if you make the thing optional and present a 'useful' use-case implemented in userspace code we could take it :-) Anyway, since PPC people thought it worth baking into hardware, presumably they have a compelling use case. Mikey could you see if you can retrieve that from someone in the know? It might be interesting. I don't think we can mispredict a non-conditional non-computed but I'll have to check with the HW folks. I was mostly wondering about the use-case for the conditional filter. Stephane didn't think it useful, clearly your hardware guys thought different :-) From my experience talking with compiler people, they care about ALL the branches and not the conditional so much. They use LBR to do basic block profiling. OK. I don't have a good handle on what's useful for compilers or JITs right now. I'm just plumbing through what's possible. I understand. It is okay to extend the interface. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] perf, x86, lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
On Thu, May 16, 2013 at 1:16 PM, Peter Zijlstra pet...@infradead.org wrote: On Thu, May 16, 2013 at 08:15:17PM +1000, Michael Neuling wrote: Peter, BTW PowerPC also has the ability to filter on conditional branches. Any chance we could add something like the follow to perf also? I don't see an immediate problem with that except that we on x86 need to implement that in the software filter. Stephane do you see any fundamental issue with that? On X86, the LBR cannot filter on conditional in HW. Thus as Peter said, it would have to be done in SW. I did not add that because I think those branches are not necessarily useful for tools. diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index fb104e5..891c769 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -157,8 +157,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ANY_CALL = 1U 4, /* any call branch */ PERF_SAMPLE_BRANCH_ANY_RETURN = 1U 5, /* any return branch */ PERF_SAMPLE_BRANCH_IND_CALL = 1U 6, /* indirect calls */ + PERF_SAMPLE_BRANCH_CONDITIONAL = 1U 7, /* conditional branches */ - PERF_SAMPLE_BRANCH_MAX = 1U 7, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U 8, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index cdf58ec..5b0b89d 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -676,6 +676,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT(any_call, PERF_SAMPLE_BRANCH_ANY_CALL), BRANCH_OPT(any_ret, PERF_SAMPLE_BRANCH_ANY_RETURN), BRANCH_OPT(ind_call, PERF_SAMPLE_BRANCH_IND_CALL), + BRANCH_OPT(cnd, PERF_SAMPLE_BRANCH_CONDITIONAL), BRANCH_END }; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Invalid perf_branch_entry.to entries question
On Wed, May 8, 2013 at 5:59 PM, Peter Zijlstra pet...@infradead.org wrote: On Tue, May 07, 2013 at 11:35:28AM +1000, Michael Neuling wrote: Peter Stephane, We are plumbing the POWER8 Branch History Rolling Buffer (BHRB) into struct perf_branch_entry. Sometimes on POWER8 we may not be able to fill out the to address. Just because I'm curious.. however does that happen? Surely the CPU knows where next to fetch instructions? We initially thought of just making this 0, but it's feasible that this could be a valid address to branch to. Right, while highly unlikely, x86 actually has some cases where 0 address is valid *shudder*.. The other logical value to indicate an invalid entry would be all 1s which is not possible (on POWER at least). Do you guys have a preference as to what we should use as an invalid entry? This would have some consequences for the userspace tool also. The alternative would be to add a flag alongside mispred/predicted to indicate the validity of the to address. Either would work with me I suppose.. Stephane do you have any preference? But if the 'to' is bogus, why not just drop the sample? That happens on x86 if the HW captured branches which do not correspond to user filter settings (due to bug). ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: perf: POWER-event translation questions
On Thu, Nov 8, 2012 at 2:10 AM, Sukadev Bhattiprolu suka...@linux.vnet.ibm.com wrote: Looking for feedback on this prototype for making POWER-specific event translations available in sysfs. It is based on the patchset: https://lkml.org/lkml/2012/11/7/402 which makes the translations for _generic_ events in POWER available in sysfs: Since this is in POWER7 specific code I am assigning the names given in the POWER7 CPU spec for now. I had earlier tried mapping these events to generic names outside sysfs: Power7 name Generic name cmpl-stall-fxu stalled-cycles-fixed-point cmpl-stall-lsu stalled-cycles-load-store cmpl-stall-ifu stalled-cycles-instruction-fetch cmpl-stall-bru stalled-cycles-branch-unit But like Stephane Eranian pointed out mapping such events across architectures can be confusing. Another challenge I suspect we will have is the extremely long generic names we could end up with as the events get more specific. 1. Can we have more than one name for an event ? i.e two sysfs entries, eg: 'cmpl-stall-fxu' and 'stalled-cycles-fixed-point' for an event ? Yes, you can. What is really used is the content of the file and two files can have the same content. 2. Can we allow hyphens in the {name} token (please see my change to util/parse-events.l below). With this change, I can run: The current code does not support this but Andi fixed that in his HSW patch and I use it for the PEBS-LL patch series as well. perf stat -e cpu/cmplu-stall-bru /tmp/nop without any changes to the user level tool (parse-events.l) I have tested some common cases, not sure if it will break something :-) If we are going to create generic or arch specific sysfs entries in /sys/bus/event_source/devices/cpu/events, do we need to add corresponding entry in tools/perf/util/parse-events.l ? Shouldn't be necessary. perf should grab those events automatically from sysfs. As per Jiri, the hardcoded tables are only used to support backward compatibility for kernels without sysfs event entries. Sukadev --- arch/powerpc/perf/power7-pmu.c | 13 + tools/perf/util/parse-events.l |2 +- 2 files changed, 14 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c index aa9f588..9f46abc 100644 --- a/arch/powerpc/perf/power7-pmu.c +++ b/arch/powerpc/perf/power7-pmu.c @@ -303,6 +303,10 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[]) #definePM_LD_MISS_L1 0x400f0 #definePM_BRU_FIN 0x10068 #definePM_BRU_MPRED0x400f6 +#definePM_CMPLU_STALL_FXU 0x20014 +#definePM_CMPLU_STALL_LSU 0x20012 +#definePM_CMPLU_STALL_IFU 0x4004c +#definePM_CMPLU_STALL_BRU 0x4004e static int power7_generic_events[] = { [PERF_COUNT_HW_CPU_CYCLES] =PM_CYC, @@ -369,6 +373,11 @@ EVENT_ATTR(cache-misses, LD_MISS_L1); EVENT_ATTR(branch-instructions,BRU_FIN); EVENT_ATTR(branch-misses, BRU_MPRED); +EVENT_ATTR(cmplu-stall-fxu,CMPLU_STALL_FXU); +EVENT_ATTR(cmplu-stall-lsu,CMPLU_STALL_LSU); +EVENT_ATTR(cmplu-stall-ifu,CMPLU_STALL_IFU); +EVENT_ATTR(cmplu-stall-bru,CMPLU_STALL_BRU); + static struct attribute *power7_events_attr[] = { EVENT_PTR(CYC), EVENT_PTR(GCT_NOSLOT_CYC), @@ -378,6 +387,10 @@ static struct attribute *power7_events_attr[] = { EVENT_PTR(LD_MISS_L1), EVENT_PTR(BRU_FIN), EVENT_PTR(BRU_MPRED), + EVENT_PTR(CMPLU_STALL_FXU), + EVENT_PTR(CMPLU_STALL_LSU), + EVENT_PTR(CMPLU_STALL_IFU), + EVENT_PTR(CMPLU_STALL_BRU), NULL, }; diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l index c87efc1..1967bb2 100644 --- a/tools/perf/util/parse-events.l +++ b/tools/perf/util/parse-events.l @@ -80,7 +80,7 @@ event [^,{}/]+ num_dec[0-9]+ num_hex0x[a-fA-F0-9]+ num_raw_hex[a-fA-F0-9]+ -name [a-zA-Z_*?][a-zA-Z0-9_*?]* +name [-a-zA-Z_*?][-a-zA-Z0-9_*?]* modifier_event [ukhpGH]{1,8} modifier_bp[rwx]{1,3} -- 1.7.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC][PATCH] perf: Add a few generic stalled-cycles events
On Tue, Oct 16, 2012 at 12:08 PM, Robert Richter robert.rich...@amd.com wrote: Sukadev, On 15.10.12 17:55:34, Robert Richter wrote: On 11.10.12 18:28:39, Sukadev Bhattiprolu wrote: + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT }, + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE }, + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH }, + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_BRANCH }, Instead of adding new hardware event types I would prefer to use raw events in conjunction with sysfs, see e.g. the intel-uncore implementation. Something like: In general, I don't like generic events and especially stall events. I have not seen a clear definition of what they mean. Without it, there is no way to understand how to map them across architecture. If the definition is too precise, you may not be able to find an exact mapping. If the definition is to loose then it is unclear what you are measuring. Also this opens another can of worms which is that on some processors, you may need more than one event to encapsulate what the generic event is supposed to measure. That means developing a lot of code in the kernel to express and manage that. And of course, you would not be able to sample on those events (you cannot sample on a difference, for instance). So all in all, I think this is not a very good idea. You have to put this into the tool or a library that auto-detects the host CPU and programs the right set of events. We've had that discussion many times. Just reiterating my personal opinion on this. $ find /sys/bus/event_source/devices/cpu/events/ ... /sys/bus/event_source/devices/cpu/events/stalled-cycles-fixed-point /sys/bus/event_source/devices/cpu/events/stalled-cycles-load-store /sys/bus/event_source/devices/cpu/events/stalled-cycles-instruction-fetch /sys/bus/event_source/devices/cpu/events/stalled-cycles-branch ... $ cat /sys/bus/event_source/devices/cpu/events/stalled-cycles-fixed-point event=0xff,umask=0x00 Perf tool works then out-of-the-box with: $ perf record -e cpu/stalled-cycles-fixed-point/ ... I refer here to arch/x86/kernel/cpu/perf_event_intel_uncore.c (should be in v3.7-rc1 or tip:perf/core). See the INTEL_UNCORE_EVENT_DESC() macro and 'if (type-event_descs) ...' in uncore_type_init(). The code should be reworked to be non-architectural. PMU registration is implemented for a longer time already for all architectures and pmu types: /sys/bus/event_source/devices/* But /sys/bus/event_source/devices/*/events/ exists only for a small number of pmus. Perf tool support of this was implemented with: a6146d5 perf/tool: Add PMU event alias support -Robert -- Advanced Micro Devices, Inc. Operating System Research Center ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: perf: power_pmu_start restores incorrect values, breaking frequency events
Glad to see you fixed the PPC problem. On Thu, Feb 16, 2012 at 5:57 AM, Paul Mackerras pau...@samba.org wrote: On Thu, Feb 16, 2012 at 03:48:22PM +1100, Anton Blanchard wrote: perf on POWER stopped working after commit e050e3f0a71b (perf: Fix broken interrupt rate throttling). That patch exposed a bug in the POWER perf_events code. Since the PMCs count upwards and take an exception when the top bit is set, we want to write 0x8000 - left in power_pmu_start. We were instead programming in left which effectively disables the counter until we eventually hit 0x8000. This could take seconds or longer. With the patch applied I get the expected number of samples: # taskset -c 0 yes /dev/null # perf record -C 0 -a sleep 10 # perf report -D | grep SAMPLE | tail -1 SAMPLE events: 9948 Signed-off-by: Anton Blanchard an...@samba.org Acked-by: Paul Mackerras pau...@samba.org ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev