Re: [PATCH V2 0/6] perf: New conditional branch filter

2013-10-09 Thread Anshuman Khandual
On 09/26/2013 04:44 PM, Stephane Eranian wrote:
 So you are saying that the HW filter is exclusive. That seems odd. But
 I think it is
 because of the choices is ANY. ANY covers all the types of branches. Therefore
 it does not make a difference whether you add COND or not. And
 vice-versa, if you
 set COND, you need to disable ANY. I bet if you add other filters such
 as CALL, RETURN,
 then you could OR them and say: I want RETURN or CALLS.
 
 But that's okay. The API operates in OR mode but if the HW does not
 support it, you
 can check the mask and reject if more than one type is set. That is
 arch-specific code.
 The alternative, if to only capture ANY and emulate the filter in SW.
 This will work, of
 course. But the downside, is that you lose the way to appreciate how
 many, for instance,
 COND branches you sampled out of the total number of COND branches
 retired. Unless
 you can count COND branches separately.

Hey Stephane,

Thanks for your reply. I am working on a solution where PMU will process
all the requested branch filters in HW only if it can filter all of them in an
OR manner else it will just leave the entire thing upto the SW to process and
do no filtering itself. This implies that branch filtering will either happen
completely in HW or completely in SW and never in a mixed manner. This way
it will conform to the OR mode defined in the API. I will post the revised
patch set soon.

Regards
Anshuman

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2 0/6] perf: New conditional branch filter

2013-09-25 Thread Anshuman Khandual
On 09/25/2013 07:49 AM, Michael Ellerman wrote:
 On Mon, 2013-09-23 at 14:45 +0530, Anshuman Khandual wrote:
 On 09/21/2013 12:25 PM, Stephane Eranian wrote:
 On Tue, Sep 10, 2013 at 4:06 AM, Michael Ellerman
 mich...@ellerman.id.au wrote:

 On Fri, 2013-08-30 at 09:54 +0530, Anshuman Khandual wrote:
   This patchset is the re-spin of the original branch stack sampling
 patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This 
 patchset
 also enables SW based branch filtering support for PPC64 platforms 
 which have
 branch stack sampling support. With this new enablement, the branch 
 filter support
 for PPC64 platforms have been extended to include all these 
 combinations discussed
 below with a sample test application program.

 ...

 Mixed filters
 -
 (6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
 Error:
 The perf.data file has no samples!

 NOTE: As expected. The HW filters all the branches which are calls and 
 SW tries to find return
 branches in that given set. Both the filters are mutually exclussive, 
 so obviously no samples
 found in the end profile.

 The semantics of multiple filters is not clear to me. It could be an OR,
 or an AND. You have implemented AND, does that match existing behaviour
 on x86 for example?

 The semantic on the API is OR. AND does not make sense: CALL  RETURN?
 On x86, the HW filter is an OR (default: ALL, set bit to disable a
 type). I suspect
 it is similar on PPC.

 Given the situation as explained here, which semantic would be better for 
 single
 HW and multiple SW filters. Accordingly validate_instruction() function will 
 have
 to be re-implemented. But I believe OR-ing the SW filters will be preferable.

  (1) (HW_FILTER_1)  (SW_FILTER_1)  (SW_FILTER_2)
  or
  (2) (HW_FILTER_1)  (SW_FILTER_1 || SW_FILTER_2)

 Please let me know your inputs and suggestions on this. Thank you.
 
 You need to implement the correct semantics, regardless of how the
 hardware happens to work.
 
 That means if multiple filters are specified you need to do all the
 filtering in software.

Hello Stephane,

I looked at the X86 code on branch filtering implementation.

(1) During event creation intel_pmu_hw_config calls intel_pmu_setup_lbr_filter 
when LBR sampling
is required, intel_pmu_setup_lbr_filter calls these two functions 

(a) intel_pmu_setup_sw_lbr_filter

event-hw.branch_reg.reg contains all the SW filter masks which can be
supported for the user requested filters event-attr.branch_sample_type 
(even
if some of them could implemented in PMU HW)

(b) intel_pmu_setup_hw_lbr_filter (when HW filtering is present)

event-hw.branch_reg.config contains all the PMU HW filter masks 
corresponding
to the requested filters in event-attr.branch_sample_type. One point 
to note
here is that if the user has requested for some branch filter which is 
not supported
in the HW LBR filter, the event creation request is rejected with 
EOPNOTSUPP. This
not true for the filters which can be ignored in the PMU.

(2) When the event is enabled in the PMU

(a) cpuc-lbr_sel-config gets into the HW register to enable the 
filtering of branches
which was determined in the function intel_pmu_setup_hw_lbr_filter. 

(3) After the IRQ happened, intel_pmu_lbr_read reads all the entries from the 
LBR  HW and then
applies the filter in the function intel_pmu_lbr_filter.

(a) intel_pmu_lbr_filter functions take into account cpuc-br_sel 
(which is nothing but
event-hw.branch_reg.reg as determined in the function 
intel_pmu_setup_sw_lbr_filter)
which contains the entire branch filter request set in terms applicable 
SW filter. Here
the semantic is OR when we look at from SW filter implementation point 
of view.

   BUT what branch record set we are working on right now ? A set which was 
captured with LBR HW
   with cpuc-lbr_sel-config filters enabled on it. So to me the X86 
implementation of the semantics
   look something like this.

A - Branch filter set requested by the user
B - Subset of A which can be supported in HW
C - Subset of A which can be supported in SW

(B)  (C) 

NOTE: Individual filters are OR-ed inside both B and C sets.

So here the semantics is not a true OR. This is my understanding till now which 
may be wrong. Please
help me understand if the semantics is something otherwise than what is 
explained above.

In POWER8 because we cannot OR individual HW PMU supported filters, till now 
the semantics looked a bit odd.
But as Michael has pointed out here that if there are multiple branch filter 
requests implement all of them
in SW. Only in case where the user requests for an individual filter and if it 
happen to be supported in HW
PMU, we will use the PMU filters.

Regards
Anshuman

___
Linuxppc-dev mailing list

Re: [PATCH V2 0/6] perf: New conditional branch filter

2013-09-24 Thread Michael Ellerman
On Mon, 2013-09-23 at 14:45 +0530, Anshuman Khandual wrote:
 On 09/21/2013 12:25 PM, Stephane Eranian wrote:
  On Tue, Sep 10, 2013 at 4:06 AM, Michael Ellerman
  mich...@ellerman.id.au wrote:
  
   On Fri, 2013-08-30 at 09:54 +0530, Anshuman Khandual wrote:
  This patchset is the re-spin of the original branch stack 
sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This 
patchset
also enables SW based branch filtering support for PPC64 platforms 
which have
branch stack sampling support. With this new enablement, the branch 
filter support
for PPC64 platforms have been extended to include all these 
combinations discussed
below with a sample test application program.
  
   ...
  
Mixed filters
-
(6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
Error:
The perf.data file has no samples!
   
NOTE: As expected. The HW filters all the branches which are calls 
and SW tries to find return
branches in that given set. Both the filters are mutually exclussive, 
so obviously no samples
found in the end profile.
  
   The semantics of multiple filters is not clear to me. It could be an OR,
   or an AND. You have implemented AND, does that match existing behaviour
   on x86 for example?
 
  The semantic on the API is OR. AND does not make sense: CALL  RETURN?
  On x86, the HW filter is an OR (default: ALL, set bit to disable a
  type). I suspect
  it is similar on PPC.
 
 Given the situation as explained here, which semantic would be better for 
 single
 HW and multiple SW filters. Accordingly validate_instruction() function will 
 have
 to be re-implemented. But I believe OR-ing the SW filters will be preferable.
 
   (1) (HW_FILTER_1)  (SW_FILTER_1)  (SW_FILTER_2)
   or
   (2) (HW_FILTER_1)  (SW_FILTER_1 || SW_FILTER_2)
 
 Please let me know your inputs and suggestions on this. Thank you.

You need to implement the correct semantics, regardless of how the
hardware happens to work.

That means if multiple filters are specified you need to do all the
filtering in software.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2 0/6] perf: New conditional branch filter

2013-09-23 Thread Anshuman Khandual
On 09/21/2013 12:25 PM, Stephane Eranian wrote:
 On Tue, Sep 10, 2013 at 4:06 AM, Michael Ellerman
 mich...@ellerman.id.au wrote:
 
  On Fri, 2013-08-30 at 09:54 +0530, Anshuman Khandual wrote:
 This patchset is the re-spin of the original branch stack sampling
   patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This 
   patchset
   also enables SW based branch filtering support for PPC64 platforms 
   which have
   branch stack sampling support. With this new enablement, the branch 
   filter support
   for PPC64 platforms have been extended to include all these 
   combinations discussed
   below with a sample test application program.
 
  ...
 
   Mixed filters
   -
   (6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
   Error:
   The perf.data file has no samples!
  
   NOTE: As expected. The HW filters all the branches which are calls and 
   SW tries to find return
   branches in that given set. Both the filters are mutually exclussive, 
   so obviously no samples
   found in the end profile.
 
  The semantics of multiple filters is not clear to me. It could be an OR,
  or an AND. You have implemented AND, does that match existing behaviour
  on x86 for example?
 
 The semantic on the API is OR. AND does not make sense: CALL  RETURN?
 On x86, the HW filter is an OR (default: ALL, set bit to disable a
 type). I suspect
 it is similar on PPC.

Hey Stephane,

In POWER8 BHRB, we have got three HW PMU filters out of which we are trying
to use two of them PERF_SAMPLE_BRANCH_ANY_CALL and PERF_SAMPLE_BRANCH_COND
respectively.

(1) These filters are exclusive of each other and cannot be OR-ed with each 
other

(2) The SW filters are applied on the branch record set captured with BHRB
which have the HW filters applied. So the working set is already reduced
with the HW PMU filters. SW filter goes through the working set and figures
out which one of them satisfy the SW filter criteria and gets picked up. The
SW filter cannot find out branches records which matches the criteria 
outside
of BHRB captured set. So we cannot OR the filters.

This makes the combination of HW and SW filter inherently an AND not OR.

(3) But once we have captured the BHRB filtered data with HW PMU filter, 
multiple SW
filters (if requested) can be applied either in OR or AND manner.

It should be either like
(1) (HW_FILTER_1)  (SW_FILTER_1)  (SW_FILTER_2)
or like
(2) (HW_FILTER_1)  (SW_FILTER_1 || SW_FILTER_2)

NOTE: I admit that the current validate_instruction() function does not do
either of them correctly. Will fix it in the next iteration.

(4) These combination of filters are not supported right now because

(a) We are unable to process two HW PMU filters simultaneously
(b) We have not worked on replacement SW filter for either of the HW 
filters

(1) (HW_FILTER_1), (HW_FILTER_2)
(2) (HW_FILTER_1), (HW_FILTER_2), (SW_FILTER_1)
(3) (HW_FILTER_1), (HW_FILTER_2), (SW_FILTER_1), (SW_FILTER_2)

   How ever these combination of filters can be supported right now.

(1) (HW_FILTER_1)
(2) (HW_FILTER_2)

(3) (SW_FILTER_1)
(4) (SW_FILTER_2)
(5) (SW_FILTER_1), (SW_FILTER_2)

(6)  (HW_FILTER_1), (SW_FILTER_1)
(7)  (HW_FILTER_1), (SW_FILTER_2)
(8)  (HW_FILTER_1), (SW_FILTER_1), (SW_FILTER_2)
(9)  (HW_FILTER_2), (SW_FILTER_1)
(10) (HW_FILTER_2), (SW_FILTER_2)
(11) (HW_FILTER_2), (SW_FILTER_1), (SW_FILTER_2)


Given the situation as explained here, which semantic would be better for single
HW and multiple SW filters. Accordingly validate_instruction() function will 
have
to be re-implemented. But I believe OR-ing the SW filters will be preferable.

(1) (HW_FILTER_1)  (SW_FILTER_1)  (SW_FILTER_2)
or
(2) (HW_FILTER_1)  (SW_FILTER_1 || SW_FILTER_2)

Please let me know your inputs and suggestions on this. Thank you.

Regards
Anshuman

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2 0/6] perf: New conditional branch filter

2013-09-21 Thread Anshuman Khandual
On 08/30/2013 05:18 PM, Stephane Eranian wrote:
 2013/8/30 Anshuman Khandual khand...@linux.vnet.ibm.com
 
  This patchset is the re-spin of the original branch stack sampling
  patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
  also enables SW based branch filtering support for PPC64 platforms which 
  have
  branch stack sampling support. With this new enablement, the branch filter 
  support
  for PPC64 platforms have been extended to include all these combinations 
  discussed
  below with a sample test application program.
 
 
 I am trying to understand which HW has support for capturing the
 branches: PPC7 or PPC8.
 Then it seems you're saying that only PPC8 has the filtering support.
 On PPC7 you use the
 SW filter. Did I get this right?
 
 I will look at the patch set.
 

Hey Stephane,

Just wondering if you got a chance to go though the patchset ?

Regards
Anshuman

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2 0/6] perf: New conditional branch filter

2013-09-21 Thread Anshuman Khandual
On 09/21/2013 12:11 PM, Anshuman Khandual wrote:
 On 08/30/2013 05:18 PM, Stephane Eranian wrote:
 2013/8/30 Anshuman Khandual khand...@linux.vnet.ibm.com

 This patchset is the re-spin of the original branch stack sampling
 patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
 also enables SW based branch filtering support for PPC64 platforms which 
 have
 branch stack sampling support. With this new enablement, the branch filter 
 support
 for PPC64 platforms have been extended to include all these combinations 
 discussed
 below with a sample test application program.


 I am trying to understand which HW has support for capturing the
 branches: PPC7 or PPC8.
 Then it seems you're saying that only PPC8 has the filtering support.
 On PPC7 you use the
 SW filter. Did I get this right?

 I will look at the patch set.

 
 Hey Stephane,
 
 Just wondering if you got a chance to go though the patchset ?


s/though/through/

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2 0/6] perf: New conditional branch filter

2013-09-09 Thread Michael Ellerman
On Fri, 2013-08-30 at 09:54 +0530, Anshuman Khandual wrote:
   This patchset is the re-spin of the original branch stack sampling
 patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
 also enables SW based branch filtering support for PPC64 platforms which have
 branch stack sampling support. With this new enablement, the branch filter 
 support
 for PPC64 platforms have been extended to include all these combinations 
 discussed
 below with a sample test application program.

...

 Mixed filters
 -
 (6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
 Error:
 The perf.data file has no samples!
 
 NOTE: As expected. The HW filters all the branches which are calls and SW 
 tries to find return
 branches in that given set. Both the filters are mutually exclussive, so 
 obviously no samples
 found in the end profile.

The semantics of multiple filters is not clear to me. It could be an OR,
or an AND. You have implemented AND, does that match existing behaviour
on x86 for example?

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2 0/6] perf: New conditional branch filter

2013-09-09 Thread Anshuman Khandual
On 09/10/2013 07:36 AM, Michael Ellerman wrote:
 On Fri, 2013-08-30 at 09:54 +0530, Anshuman Khandual wrote:
  This patchset is the re-spin of the original branch stack sampling
 patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
 also enables SW based branch filtering support for PPC64 platforms which have
 branch stack sampling support. With this new enablement, the branch filter 
 support
 for PPC64 platforms have been extended to include all these combinations 
 discussed
 below with a sample test application program.
 
 ...
 
 Mixed filters
 -
 (6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
 Error:
 The perf.data file has no samples!

 NOTE: As expected. The HW filters all the branches which are calls and SW 
 tries to find return
 branches in that given set. Both the filters are mutually exclussive, so 
 obviously no samples
 found in the end profile.
 
 The semantics of multiple filters is not clear to me. It could be an OR,
 or an AND. You have implemented AND, does that match existing behaviour
 on x86 for example?

I believe it does match. X86 code drops the branch records (originally captured
in the LBR) while applying the SW filters.

Regards
Anshuman

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2 0/6] perf: New conditional branch filter

2013-09-01 Thread Anshuman Khandual
On 08/30/2013 05:18 PM, Stephane Eranian wrote:
 2013/8/30 Anshuman Khandual khand...@linux.vnet.ibm.com
 
  This patchset is the re-spin of the original branch stack sampling
  patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
  also enables SW based branch filtering support for PPC64 platforms which 
  have
  branch stack sampling support. With this new enablement, the branch filter 
  support
  for PPC64 platforms have been extended to include all these combinations 
  discussed
  below with a sample test application program.
 
 
 I am trying to understand which HW has support for capturing the
 branches: PPC7 or PPC8.
 Then it seems you're saying that only PPC8 has the filtering support.
 On PPC7 you use the
 SW filter. Did I get this right?
 
 I will look at the patch set.
 

Hey Stephane,

POWER7 does not have BHRB support required to capture the branches. Right
now its only POWER8 (which has BHRB) can capture branches in HW. It has some
PMU level branch filters and rest we have implemented in SW. But these SW
filters cannot be applied in POWER7 as it does not support branch stack 
sampling because of lack of BHRB. I have mentioned PPC64 support in the
sense that this SW filtering code could be used in existing or future generation
powerpc processors which would have PMU support for branch stack sampling. My
apologies if the description for the patchset was ambiguous.

Regards
Anshuman

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH V2 0/6] perf: New conditional branch filter

2013-08-29 Thread Anshuman Khandual
This patchset is the re-spin of the original branch stack sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
also enables SW based branch filtering support for PPC64 platforms which have
branch stack sampling support. With this new enablement, the branch filter 
support
for PPC64 platforms have been extended to include all these combinations 
discussed
below with a sample test application program.


(1) perf record -e branch-misses:u -b ./cprog
# Overhead  Command  Source Shared Object  Source Symbol  Target Shared 
Object  Target Symbol
#   ...    .  
  .
#
 4.42%cprog  cprog [k] sw_4_2 cprog 
[k] lr_addr  
 4.41%cprog  cprog [k] symbol2cprog 
[k] hw_1_2   
 4.41%cprog  cprog [k] ctr_addr   cprog 
[k] sw_4_1   
 4.41%cprog  cprog [k] lr_addrcprog 
[k] sw_4_2   
 4.41%cprog  cprog [k] sw_4_2 cprog 
[k] callme   
 4.41%cprog  cprog [k] symbol1cprog 
[k] hw_1_1   
 4.41%cprog  cprog [k] success_3_1_3  cprog 
[k] sw_3_1   
 2.43%cprog  cprog [k] sw_4_1 cprog 
[k] ctr_addr 
 2.43%cprog  cprog [k] hw_1_2 cprog 
[k] symbol2  
 2.43%cprog  cprog [k] callme cprog 
[k] hw_1_2   
 2.43%cprog  cprog [k] address1   cprog 
[k] back1
 2.43%cprog  cprog [k] back1  cprog 
[k] callme   
 2.43%cprog  cprog [k] hw_2_1 cprog 
[k] address1 
 2.43%cprog  cprog [k] sw_3_1_1   cprog 
[k] sw_3_1   
 2.43%cprog  cprog [k] sw_3_1_2   cprog 
[k] sw_3_1   
 2.43%cprog  cprog [k] sw_3_1_3   cprog 
[k] sw_3_1   
 2.43%cprog  cprog [k] sw_3_1 cprog 
[k] sw_3_1_1 
 2.43%cprog  cprog [k] sw_3_1 cprog 
[k] sw_3_1_2 
 2.43%cprog  cprog [k] sw_3_1 cprog 
[k] sw_3_1_3 
 2.43%cprog  cprog [k] callme cprog 
[k] sw_3_1   
 2.43%cprog  cprog [k] callme cprog 
[k] sw_4_2   
 2.43%cprog  cprog [k] hw_1_1 cprog 
[k] symbol1  
 2.43%cprog  cprog [k] callme cprog 
[k] hw_1_1   
 2.42%cprog  cprog [k] sw_3_1 cprog 
[k] callme   
 1.99%cprog  cprog [k] success_3_1_1  cprog 
[k] sw_3_1   
 1.99%cprog  cprog [k] sw_3_1 cprog 
[k] success_3_1_1
 1.99%cprog  cprog [k] address2   cprog 
[k] back2
 1.99%cprog  cprog [k] hw_2_2 cprog 
[k] address2 
 1.99%cprog  cprog [k] back2  cprog 
[k] callme   
 1.99%cprog  cprog [k] callme cprog 
[k] main 
 1.99%cprog  cprog [k] sw_3_1 cprog 
[k] success_3_1_3
 1.99%cprog  cprog [k] hw_1_1 cprog 
[k] callme   
 1.99%cprog  cprog [k] sw_3_2 cprog 
[k] callme   
 1.99%cprog  cprog [k] callme cprog 
[k] sw_3_2   
 1.99%cprog  cprog [k] success_3_1_2  cprog 
[k] sw_3_1   
 1.99%cprog  cprog [k] sw_3_1 cprog 
[k] success_3_1_2
 1.99%cprog  cprog [k] hw_1_2 cprog 
[k] callme   
 1.99%cprog  cprog [k] sw_4_1 cprog 
[k] callme   
 0.02%cprog  [unknown] [k] 0xf7ba2328