Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao



On 7/11/2017 10:28 AM, Michael Ellerman wrote:

"Jin, Yao"  writes:


On 7/10/2017 9:46 PM, Peter Zijlstra wrote:

On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:


PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
transition).

So your "PERF_BR_INT" is a system call?

The "INT" thing has indeed been used as system call mechanism (typically
INT 80). But these days we have special purpose syscall instructions.

It could maybe be compared to the PPC "Unconditional TRAP with
immediate" where you use the immediate value as an index into a handler
vector.


And PERF_BR_IRQ is not an interrupt request (as its name suggests),
not what we call an "external interrupt" either; instead it is every
interrupt that is not a system call?

It is actual interrupts, but also faults, traps and all the other
exceptions not caused by "INT" I think.


Yes. It's interrupt, traps, faults. If from is in the user space and to
is in the kernel, it indicates the ring3 -> ring0 transition.

If the from instruction is not syscall or other ring transition
instruction, it should be interrupt, traps and faults. That's how we get
the PERF_BR_IRQ on x86.

Anyway, maybe we just use a minimum but the most common set of branch
types now, it could be a good start and acceptable on all architectures.

PERF_BR_COND= 1,/* conditional */
PERF_BR_UNCOND= 2,/* unconditional */
PERF_BR_IND= 3,/* indirect */
PERF_BR_CALL= 4,/* call */
PERF_BR_IND_CALL= 5,/* indirect call */
PERF_BR_RET= 6,/* return */

That would be fine by me, if you're sick of talking about it and just
want to get it merged :)

:)


I think you could expand it a bit, this list would cover the vast bulk
of branch types for us:

   PERF_BR_COND /* Conditional */
   PERF_BR_UNCOND   /* Unconditional */
   PERF_BR_IND  /* Indirect */
   PERF_BR_CALL /* Function call */
   PERF_BR_IND_CALL /* Indirect function call */
   PERF_BR_RET  /* Function return */
   PERF_BR_SYSCALL  /* Syscall */
   PERF_BR_SYSRET   /* Syscall return */
   PERF_BR_COND_CALL/* Conditional function call */
   PERF_BR_COND_RET /* Conditional function return */

cheers


OK, accept! Use 4 bits for above branch types and we can reserve 5 for 
potential future types.


Thanks
Jin Yao



Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao



On 7/11/2017 10:28 AM, Michael Ellerman wrote:

"Jin, Yao"  writes:


On 7/10/2017 9:46 PM, Peter Zijlstra wrote:

On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:


PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
transition).

So your "PERF_BR_INT" is a system call?

The "INT" thing has indeed been used as system call mechanism (typically
INT 80). But these days we have special purpose syscall instructions.

It could maybe be compared to the PPC "Unconditional TRAP with
immediate" where you use the immediate value as an index into a handler
vector.


And PERF_BR_IRQ is not an interrupt request (as its name suggests),
not what we call an "external interrupt" either; instead it is every
interrupt that is not a system call?

It is actual interrupts, but also faults, traps and all the other
exceptions not caused by "INT" I think.


Yes. It's interrupt, traps, faults. If from is in the user space and to
is in the kernel, it indicates the ring3 -> ring0 transition.

If the from instruction is not syscall or other ring transition
instruction, it should be interrupt, traps and faults. That's how we get
the PERF_BR_IRQ on x86.

Anyway, maybe we just use a minimum but the most common set of branch
types now, it could be a good start and acceptable on all architectures.

PERF_BR_COND= 1,/* conditional */
PERF_BR_UNCOND= 2,/* unconditional */
PERF_BR_IND= 3,/* indirect */
PERF_BR_CALL= 4,/* call */
PERF_BR_IND_CALL= 5,/* indirect call */
PERF_BR_RET= 6,/* return */

That would be fine by me, if you're sick of talking about it and just
want to get it merged :)

:)


I think you could expand it a bit, this list would cover the vast bulk
of branch types for us:

   PERF_BR_COND /* Conditional */
   PERF_BR_UNCOND   /* Unconditional */
   PERF_BR_IND  /* Indirect */
   PERF_BR_CALL /* Function call */
   PERF_BR_IND_CALL /* Indirect function call */
   PERF_BR_RET  /* Function return */
   PERF_BR_SYSCALL  /* Syscall */
   PERF_BR_SYSRET   /* Syscall return */
   PERF_BR_COND_CALL/* Conditional function call */
   PERF_BR_COND_RET /* Conditional function return */

cheers


OK, accept! Use 4 bits for above branch types and we can reserve 5 for 
potential future types.


Thanks
Jin Yao



Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
"Jin, Yao"  writes:

> On 7/10/2017 9:46 PM, Peter Zijlstra wrote:
>> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:
>>
 PERF_BR_INT is triggered by instruction "int" .
 PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
 transition).
>>> So your "PERF_BR_INT" is a system call?
>> The "INT" thing has indeed been used as system call mechanism (typically
>> INT 80). But these days we have special purpose syscall instructions.
>>
>> It could maybe be compared to the PPC "Unconditional TRAP with
>> immediate" where you use the immediate value as an index into a handler
>> vector.
>>
>>> And PERF_BR_IRQ is not an interrupt request (as its name suggests),
>>> not what we call an "external interrupt" either; instead it is every
>>> interrupt that is not a system call?
>> It is actual interrupts, but also faults, traps and all the other
>> exceptions not caused by "INT" I think.
>>
> Yes. It's interrupt, traps, faults. If from is in the user space and to 
> is in the kernel, it indicates the ring3 -> ring0 transition.
>
> If the from instruction is not syscall or other ring transition 
> instruction, it should be interrupt, traps and faults. That's how we get 
> the PERF_BR_IRQ on x86.
>
> Anyway, maybe we just use a minimum but the most common set of branch 
> types now, it could be a good start and acceptable on all architectures.
>
> PERF_BR_COND= 1,/* conditional */
> PERF_BR_UNCOND= 2,/* unconditional */
> PERF_BR_IND= 3,/* indirect */
> PERF_BR_CALL= 4,/* call */
> PERF_BR_IND_CALL= 5,/* indirect call */
> PERF_BR_RET= 6,/* return */

That would be fine by me, if you're sick of talking about it and just
want to get it merged :)

I think you could expand it a bit, this list would cover the vast bulk
of branch types for us:

  PERF_BR_COND  /* Conditional */
  PERF_BR_UNCOND/* Unconditional */
  PERF_BR_IND   /* Indirect */
  PERF_BR_CALL  /* Function call */
  PERF_BR_IND_CALL  /* Indirect function call */
  PERF_BR_RET   /* Function return */
  PERF_BR_SYSCALL   /* Syscall */
  PERF_BR_SYSRET/* Syscall return */
  PERF_BR_COND_CALL /* Conditional function call */
  PERF_BR_COND_RET  /* Conditional function return */

cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
"Jin, Yao"  writes:

> On 7/10/2017 9:46 PM, Peter Zijlstra wrote:
>> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:
>>
 PERF_BR_INT is triggered by instruction "int" .
 PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
 transition).
>>> So your "PERF_BR_INT" is a system call?
>> The "INT" thing has indeed been used as system call mechanism (typically
>> INT 80). But these days we have special purpose syscall instructions.
>>
>> It could maybe be compared to the PPC "Unconditional TRAP with
>> immediate" where you use the immediate value as an index into a handler
>> vector.
>>
>>> And PERF_BR_IRQ is not an interrupt request (as its name suggests),
>>> not what we call an "external interrupt" either; instead it is every
>>> interrupt that is not a system call?
>> It is actual interrupts, but also faults, traps and all the other
>> exceptions not caused by "INT" I think.
>>
> Yes. It's interrupt, traps, faults. If from is in the user space and to 
> is in the kernel, it indicates the ring3 -> ring0 transition.
>
> If the from instruction is not syscall or other ring transition 
> instruction, it should be interrupt, traps and faults. That's how we get 
> the PERF_BR_IRQ on x86.
>
> Anyway, maybe we just use a minimum but the most common set of branch 
> types now, it could be a good start and acceptable on all architectures.
>
> PERF_BR_COND= 1,/* conditional */
> PERF_BR_UNCOND= 2,/* unconditional */
> PERF_BR_IND= 3,/* indirect */
> PERF_BR_CALL= 4,/* call */
> PERF_BR_IND_CALL= 5,/* indirect call */
> PERF_BR_RET= 6,/* return */

That would be fine by me, if you're sick of talking about it and just
want to get it merged :)

I think you could expand it a bit, this list would cover the vast bulk
of branch types for us:

  PERF_BR_COND  /* Conditional */
  PERF_BR_UNCOND/* Unconditional */
  PERF_BR_IND   /* Indirect */
  PERF_BR_CALL  /* Function call */
  PERF_BR_IND_CALL  /* Indirect function call */
  PERF_BR_RET   /* Function return */
  PERF_BR_SYSCALL   /* Syscall */
  PERF_BR_SYSRET/* Syscall return */
  PERF_BR_COND_CALL /* Conditional function call */
  PERF_BR_COND_RET  /* Conditional function return */

cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
Segher Boessenkool  writes:
> Hi!
>
> On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:
>> 1. We all agree these definitions:
>> 
>> +PERF_BR_COND= 1,/* conditional */
>> +PERF_BR_UNCOND  = 2,/* unconditional */
>> +PERF_BR_IND = 3,/* indirect */
>> +PERF_BR_CALL= 4,/* call */
>> +PERF_BR_IND_CALL= 5,/* indirect call */
>> +PERF_BR_RET = 6,/* return */
>> +PERF_BR_SYSCALL = 7,/* syscall */
>> +PERF_BR_SYSRET  = 8,/* syscall return */
>> +PERF_BR_IRET= 11,   /* return from interrupt */
>
> Do we?  It does not map very well to PowerPC branch types.

I think they map well enough to the types of branches that are actually
used in practice.

To represent the full range of possibilities we'd need to switch to a
bitmap of flags, ie. COND, IND, CALL, RET, SYSCALL, INT, etc. But it
would need more than 4 bits and I don't think there's that much added
value in being able to represent all the bizarre combinations.

But maybe that is the best option as it makes the API more flexible and
means we don't have to get the list of branches correct up front?


I ran some quick numbers on a kernel I had here (powernv w/gcc 7):

  Type  Percent
  -
  cond  40.92%  beq (79166) bne (57379) ble (10411) bgt (9587) blt 
(6248) bge (3704) bdnz (1251) bdz (353) bns (30) bdnzf (2) bdnzt (1)
  uncond14.89%  b (61182) 
  indirect  0.10%   bctr (418)
  call  33.33%  bl (136926)
  ind call  1.44%   bctrl (5912)
  return9.23%   blr (37943)
=   99.91%

If we add cond call/return that covers another 0.08% taking us to 99.99%
of branches.

I know future compilers and or different code might use a different
distribution, but I doubt it will change all that much.

Maybe cond could be broken down further, but the only really meaningful
sub category I can think of is the decrementing type, and those are
quite rare.

cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
Segher Boessenkool  writes:
> Hi!
>
> On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:
>> 1. We all agree these definitions:
>> 
>> +PERF_BR_COND= 1,/* conditional */
>> +PERF_BR_UNCOND  = 2,/* unconditional */
>> +PERF_BR_IND = 3,/* indirect */
>> +PERF_BR_CALL= 4,/* call */
>> +PERF_BR_IND_CALL= 5,/* indirect call */
>> +PERF_BR_RET = 6,/* return */
>> +PERF_BR_SYSCALL = 7,/* syscall */
>> +PERF_BR_SYSRET  = 8,/* syscall return */
>> +PERF_BR_IRET= 11,   /* return from interrupt */
>
> Do we?  It does not map very well to PowerPC branch types.

I think they map well enough to the types of branches that are actually
used in practice.

To represent the full range of possibilities we'd need to switch to a
bitmap of flags, ie. COND, IND, CALL, RET, SYSCALL, INT, etc. But it
would need more than 4 bits and I don't think there's that much added
value in being able to represent all the bizarre combinations.

But maybe that is the best option as it makes the API more flexible and
means we don't have to get the list of branches correct up front?


I ran some quick numbers on a kernel I had here (powernv w/gcc 7):

  Type  Percent
  -
  cond  40.92%  beq (79166) bne (57379) ble (10411) bgt (9587) blt 
(6248) bge (3704) bdnz (1251) bdz (353) bns (30) bdnzf (2) bdnzt (1)
  uncond14.89%  b (61182) 
  indirect  0.10%   bctr (418)
  call  33.33%  bl (136926)
  ind call  1.44%   bctrl (5912)
  return9.23%   blr (37943)
=   99.91%

If we add cond call/return that covers another 0.08% taking us to 99.99%
of branches.

I know future compilers and or different code might use a different
distribution, but I doubt it will change all that much.

Maybe cond could be broken down further, but the only really meaningful
sub category I can think of is the decrementing type, and those are
quite rare.

cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Segher Boessenkool
Hi Peter,

On Mon, Jul 10, 2017 at 03:46:58PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:
> 
> > > PERF_BR_INT is triggered by instruction "int" .
> > > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> > > transition).
> > 
> > So your "PERF_BR_INT" is a system call? 
> 
> The "INT" thing has indeed been used as system call mechanism (typically
> INT 80). But these days we have special purpose syscall instructions.
> 
> It could maybe be compared to the PPC "Unconditional TRAP with
> immediate" where you use the immediate value as an index into a handler
> vector.

If we would do that, yes :-)  (We just generate a SIGTRAP instead).

> > And PERF_BR_IRQ is not an interrupt request (as its name suggests),
> > not what we call an "external interrupt" either; instead it is every
> > interrupt that is not a system call?
> 
> It is actual interrupts, but also faults, traps and all the other
> exceptions not caused by "INT" I think.

Ah, right, exceptions == interrupts for PowerPC, more terminological
confusion :-)


Segher


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Segher Boessenkool
Hi Peter,

On Mon, Jul 10, 2017 at 03:46:58PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:
> 
> > > PERF_BR_INT is triggered by instruction "int" .
> > > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> > > transition).
> > 
> > So your "PERF_BR_INT" is a system call? 
> 
> The "INT" thing has indeed been used as system call mechanism (typically
> INT 80). But these days we have special purpose syscall instructions.
> 
> It could maybe be compared to the PPC "Unconditional TRAP with
> immediate" where you use the immediate value as an index into a handler
> vector.

If we would do that, yes :-)  (We just generate a SIGTRAP instead).

> > And PERF_BR_IRQ is not an interrupt request (as its name suggests),
> > not what we call an "external interrupt" either; instead it is every
> > interrupt that is not a system call?
> 
> It is actual interrupts, but also faults, traps and all the other
> exceptions not caused by "INT" I think.

Ah, right, exceptions == interrupts for PowerPC, more terminological
confusion :-)


Segher


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao



On 7/10/2017 9:46 PM, Peter Zijlstra wrote:

On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:


PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
transition).

So your "PERF_BR_INT" is a system call?

The "INT" thing has indeed been used as system call mechanism (typically
INT 80). But these days we have special purpose syscall instructions.

It could maybe be compared to the PPC "Unconditional TRAP with
immediate" where you use the immediate value as an index into a handler
vector.


And PERF_BR_IRQ is not an interrupt request (as its name suggests),
not what we call an "external interrupt" either; instead it is every
interrupt that is not a system call?

It is actual interrupts, but also faults, traps and all the other
exceptions not caused by "INT" I think.

Yes. It's interrupt, traps, faults. If from is in the user space and to 
is in the kernel, it indicates the ring3 -> ring0 transition.


If the from instruction is not syscall or other ring transition 
instruction, it should be interrupt, traps and faults. That's how we get 
the PERF_BR_IRQ on x86.


Anyway, maybe we just use a minimum but the most common set of branch 
types now, it could be a good start and acceptable on all architectures.


PERF_BR_COND= 1,/* conditional */
PERF_BR_UNCOND= 2,/* unconditional */
PERF_BR_IND= 3,/* indirect */
PERF_BR_CALL= 4,/* call */
PERF_BR_IND_CALL= 5,/* indirect call */
PERF_BR_RET= 6,/* return */

Thanks
Jin Yao



Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao



On 7/10/2017 9:46 PM, Peter Zijlstra wrote:

On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:


PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
transition).

So your "PERF_BR_INT" is a system call?

The "INT" thing has indeed been used as system call mechanism (typically
INT 80). But these days we have special purpose syscall instructions.

It could maybe be compared to the PPC "Unconditional TRAP with
immediate" where you use the immediate value as an index into a handler
vector.


And PERF_BR_IRQ is not an interrupt request (as its name suggests),
not what we call an "external interrupt" either; instead it is every
interrupt that is not a system call?

It is actual interrupts, but also faults, traps and all the other
exceptions not caused by "INT" I think.

Yes. It's interrupt, traps, faults. If from is in the user space and to 
is in the kernel, it indicates the ring3 -> ring0 transition.


If the from instruction is not syscall or other ring transition 
instruction, it should be interrupt, traps and faults. That's how we get 
the PERF_BR_IRQ on x86.


Anyway, maybe we just use a minimum but the most common set of branch 
types now, it could be a good start and acceptable on all architectures.


PERF_BR_COND= 1,/* conditional */
PERF_BR_UNCOND= 2,/* unconditional */
PERF_BR_IND= 3,/* indirect */
PERF_BR_CALL= 4,/* call */
PERF_BR_IND_CALL= 5,/* indirect call */
PERF_BR_RET= 6,/* return */

Thanks
Jin Yao



Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Peter Zijlstra
On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:

> > PERF_BR_INT is triggered by instruction "int" .
> > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> > transition).
> 
> So your "PERF_BR_INT" is a system call? 

The "INT" thing has indeed been used as system call mechanism (typically
INT 80). But these days we have special purpose syscall instructions.

It could maybe be compared to the PPC "Unconditional TRAP with
immediate" where you use the immediate value as an index into a handler
vector.

> And PERF_BR_IRQ is not an interrupt request (as its name suggests),
> not what we call an "external interrupt" either; instead it is every
> interrupt that is not a system call?

It is actual interrupts, but also faults, traps and all the other
exceptions not caused by "INT" I think.



Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Peter Zijlstra
On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:

> > PERF_BR_INT is triggered by instruction "int" .
> > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> > transition).
> 
> So your "PERF_BR_INT" is a system call? 

The "INT" thing has indeed been used as system call mechanism (typically
INT 80). But these days we have special purpose syscall instructions.

It could maybe be compared to the PPC "Unconditional TRAP with
immediate" where you use the immediate value as an index into a handler
vector.

> And PERF_BR_IRQ is not an interrupt request (as its name suggests),
> not what we call an "external interrupt" either; instead it is every
> interrupt that is not a system call?

It is actual interrupts, but also faults, traps and all the other
exceptions not caused by "INT" I think.



Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao

Hi,

Following branch types should be common enough, right?

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */

I decide to only define these types in this patch set. For other more 
arch-related branch type, we can add it in future.


Is this OK?

Thanks
Jin Yao

On 7/10/2017 9:10 PM, Segher Boessenkool wrote:

Hi!

On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:

1. We all agree these definitions:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */
+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRET= 11,   /* return from interrupt */

Do we?  It does not map very well to PowerPC branch types.


2. I wish to keep following definitions for x86.

+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */

PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
transition).

So your "PERF_BR_INT" is a system call?  And PERF_BR_IRQ is not an
interrupt request (as its name suggests), not what we call an "external
interrupt" either; instead it is every interrupt that is not a system
call?

It also does not follow the lines of "software caused interrupt" vs.
the rest.


4. I'd like to add following types for powerpc.

PERF_BR_COND_CALL   /* Conditional call */
PERF_BR_COND_RET/* Condition return */

Almost all PowerPC branches have a "conditional" version (only "syscall"
and "sysret/iret" do not -- and those last two are the same, just like
PERF_BR_INT seems to be the same as PERF_BR_SYSCALL).

So how should those PERF_BR_* be used?  It cannot be used in an
architecture-neutral interface the way you define it now.


Segher




Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao

Hi,

Following branch types should be common enough, right?

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */

I decide to only define these types in this patch set. For other more 
arch-related branch type, we can add it in future.


Is this OK?

Thanks
Jin Yao

On 7/10/2017 9:10 PM, Segher Boessenkool wrote:

Hi!

On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:

1. We all agree these definitions:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */
+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRET= 11,   /* return from interrupt */

Do we?  It does not map very well to PowerPC branch types.


2. I wish to keep following definitions for x86.

+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */

PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
transition).

So your "PERF_BR_INT" is a system call?  And PERF_BR_IRQ is not an
interrupt request (as its name suggests), not what we call an "external
interrupt" either; instead it is every interrupt that is not a system
call?

It also does not follow the lines of "software caused interrupt" vs.
the rest.


4. I'd like to add following types for powerpc.

PERF_BR_COND_CALL   /* Conditional call */
PERF_BR_COND_RET/* Condition return */

Almost all PowerPC branches have a "conditional" version (only "syscall"
and "sysret/iret" do not -- and those last two are the same, just like
PERF_BR_INT seems to be the same as PERF_BR_SYSCALL).

So how should those PERF_BR_* be used?  It cannot be used in an
architecture-neutral interface the way you define it now.


Segher




Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Segher Boessenkool
Hi!

On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:
> 1. We all agree these definitions:
> 
> + PERF_BR_COND= 1,/* conditional */
> + PERF_BR_UNCOND  = 2,/* unconditional */
> + PERF_BR_IND = 3,/* indirect */
> + PERF_BR_CALL= 4,/* call */
> + PERF_BR_IND_CALL= 5,/* indirect call */
> + PERF_BR_RET = 6,/* return */
> + PERF_BR_SYSCALL = 7,/* syscall */
> + PERF_BR_SYSRET  = 8,/* syscall return */
> + PERF_BR_IRET= 11,   /* return from interrupt */

Do we?  It does not map very well to PowerPC branch types.

> 2. I wish to keep following definitions for x86.
> 
> + PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
> + PERF_BR_INT = 10,   /* sw interrupt */
> 
> PERF_BR_INT is triggered by instruction "int" .
> PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> transition).

So your "PERF_BR_INT" is a system call?  And PERF_BR_IRQ is not an
interrupt request (as its name suggests), not what we call an "external
interrupt" either; instead it is every interrupt that is not a system
call?

It also does not follow the lines of "software caused interrupt" vs.
the rest.

> 4. I'd like to add following types for powerpc.
> 
>   PERF_BR_COND_CALL   /* Conditional call */
>   PERF_BR_COND_RET/* Condition return */

Almost all PowerPC branches have a "conditional" version (only "syscall"
and "sysret/iret" do not -- and those last two are the same, just like
PERF_BR_INT seems to be the same as PERF_BR_SYSCALL).

So how should those PERF_BR_* be used?  It cannot be used in an
architecture-neutral interface the way you define it now.


Segher


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Segher Boessenkool
Hi!

On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:
> 1. We all agree these definitions:
> 
> + PERF_BR_COND= 1,/* conditional */
> + PERF_BR_UNCOND  = 2,/* unconditional */
> + PERF_BR_IND = 3,/* indirect */
> + PERF_BR_CALL= 4,/* call */
> + PERF_BR_IND_CALL= 5,/* indirect call */
> + PERF_BR_RET = 6,/* return */
> + PERF_BR_SYSCALL = 7,/* syscall */
> + PERF_BR_SYSRET  = 8,/* syscall return */
> + PERF_BR_IRET= 11,   /* return from interrupt */

Do we?  It does not map very well to PowerPC branch types.

> 2. I wish to keep following definitions for x86.
> 
> + PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
> + PERF_BR_INT = 10,   /* sw interrupt */
> 
> PERF_BR_INT is triggered by instruction "int" .
> PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> transition).

So your "PERF_BR_INT" is a system call?  And PERF_BR_IRQ is not an
interrupt request (as its name suggests), not what we call an "external
interrupt" either; instead it is every interrupt that is not a system
call?

It also does not follow the lines of "software caused interrupt" vs.
the rest.

> 4. I'd like to add following types for powerpc.
> 
>   PERF_BR_COND_CALL   /* Conditional call */
>   PERF_BR_COND_RET/* Condition return */

Almost all PowerPC branches have a "conditional" version (only "syscall"
and "sysret/iret" do not -- and those last two are the same, just like
PERF_BR_INT seems to be the same as PERF_BR_SYSCALL).

So how should those PERF_BR_* be used?  It cannot be used in an
architecture-neutral interface the way you define it now.


Segher


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao

Hi Michael,

Please let me summarize for the new branch type definitions.

1. We all agree these definitions:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */
+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRET= 11,   /* return from interrupt */

2. I wish to keep following definitions for x86.

+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */

PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
transition).


3. I can drop PERF_BR_FAR_BRANCH

4. I'd like to add following types for powerpc.

PERF_BR_COND_CALL   /* Conditional call */
PERF_BR_COND_RET/* Condition return */

If you agree these new definitions, I will prepare the new patch.

Thanks
Jin Yao

On 7/10/2017 6:32 PM, Michael Ellerman wrote:

"Jin, Yao"  writes:

On 7/10/2017 2:05 PM, Michael Ellerman wrote:

Jin Yao  writes:


It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.


...

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

Most of the code and doc uses "branch" but then a few these are called
"jump". Can we just stick with "branch"?


PERF_BR_NONE  : unknown
PERF_BR_JCC   : conditional jump
PERF_BR_JMP   : jump
PERF_BR_IND_JMP   : indirect jump

eg:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

Call and jump are all branches. If we want to figure out which one is
jump and which one is call, we need the detail branch type definitions.

Yeah I'm not saying we don't need the different types, I'm saying I'd
rather we just called them "branch" not "jump". Just because "jump" can
mean different things on different arches.


For example,  if we only say "PERF_BR_IND", we could not know if it's an
indirect jump or indirect call.

Yes we can, PERF_BR_IND is an indirect branch, which is not a call,
because if it was a call then it would be PERF_BR_IND_CALL.


PERF_BR_CALL  : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET   : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET: syscall return
PERF_BR_IRQ   : hw interrupt/trap/fault
PERF_BR_INT   : sw interrupt

I'm not sure what that means, I'm guessing on x86 it means someone
executed "int" ?

PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

OK, but I still don't know what that means :)

What's an example of an instruction that is PERF_BR_IRQ and PERF_BR_INT ?


PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call
(direct call and indirect call) and return.

Yep makes sense.


PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.

Yep OK.


Is that sufficiently useful to use up a bit? I think we only have 3
free?

Do you means 3 bits? Each bit stands for one branch type? I guess what
you mean is:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

But 3 branch types are not enough for us.

What I meant was you're using 4 bits for the type, so you have 16
possible values, and you've defined 13 of them. Meaning there are only 3
types free.

So we should try to only define branch types that are really useful, and
keep some free for future use.

Maybe PERF_BR_INT is really common on x86 and so it's important to count
it, but like I said above I don't know what it is.


PERF_BR_IRET  : return from interrupt
PERF_BR_FAR_BRANCH: not generic far branch type

What is a "not generic far branch" ?

I don't know what that would mean on powerpc for example.

It's reserved for future using I think.

OK so let's not put it in the Linux API until it's defined?


I think the only thing we have on powerpc that's commonly used and that
isn't covered above is branches that decrement a loop counter and then
branch based on the result.

...

Sorry, I'm not familiar with powerpc arch. Or could you add the branch
type which powerpc needs?

These are good:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */

These we wouldn't use currently, but 

Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao

Hi Michael,

Please let me summarize for the new branch type definitions.

1. We all agree these definitions:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */
+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRET= 11,   /* return from interrupt */

2. I wish to keep following definitions for x86.

+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */

PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
transition).


3. I can drop PERF_BR_FAR_BRANCH

4. I'd like to add following types for powerpc.

PERF_BR_COND_CALL   /* Conditional call */
PERF_BR_COND_RET/* Condition return */

If you agree these new definitions, I will prepare the new patch.

Thanks
Jin Yao

On 7/10/2017 6:32 PM, Michael Ellerman wrote:

"Jin, Yao"  writes:

On 7/10/2017 2:05 PM, Michael Ellerman wrote:

Jin Yao  writes:


It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.


...

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

Most of the code and doc uses "branch" but then a few these are called
"jump". Can we just stick with "branch"?


PERF_BR_NONE  : unknown
PERF_BR_JCC   : conditional jump
PERF_BR_JMP   : jump
PERF_BR_IND_JMP   : indirect jump

eg:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

Call and jump are all branches. If we want to figure out which one is
jump and which one is call, we need the detail branch type definitions.

Yeah I'm not saying we don't need the different types, I'm saying I'd
rather we just called them "branch" not "jump". Just because "jump" can
mean different things on different arches.


For example,  if we only say "PERF_BR_IND", we could not know if it's an
indirect jump or indirect call.

Yes we can, PERF_BR_IND is an indirect branch, which is not a call,
because if it was a call then it would be PERF_BR_IND_CALL.


PERF_BR_CALL  : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET   : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET: syscall return
PERF_BR_IRQ   : hw interrupt/trap/fault
PERF_BR_INT   : sw interrupt

I'm not sure what that means, I'm guessing on x86 it means someone
executed "int" ?

PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

OK, but I still don't know what that means :)

What's an example of an instruction that is PERF_BR_IRQ and PERF_BR_INT ?


PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call
(direct call and indirect call) and return.

Yep makes sense.


PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.

Yep OK.


Is that sufficiently useful to use up a bit? I think we only have 3
free?

Do you means 3 bits? Each bit stands for one branch type? I guess what
you mean is:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

But 3 branch types are not enough for us.

What I meant was you're using 4 bits for the type, so you have 16
possible values, and you've defined 13 of them. Meaning there are only 3
types free.

So we should try to only define branch types that are really useful, and
keep some free for future use.

Maybe PERF_BR_INT is really common on x86 and so it's important to count
it, but like I said above I don't know what it is.


PERF_BR_IRET  : return from interrupt
PERF_BR_FAR_BRANCH: not generic far branch type

What is a "not generic far branch" ?

I don't know what that would mean on powerpc for example.

It's reserved for future using I think.

OK so let's not put it in the Linux API until it's defined?


I think the only thing we have on powerpc that's commonly used and that
isn't covered above is branches that decrement a loop counter and then
branch based on the result.

...

Sorry, I'm not familiar with powerpc arch. Or could you add the branch
type which powerpc needs?

These are good:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */

These we wouldn't use currently, but make sense:

+   PERF_BR_SYSCALL = 7,   

Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
"Jin, Yao"  writes:
> On 7/10/2017 2:05 PM, Michael Ellerman wrote:
>> Jin Yao  writes:
>>
>>> It is often useful to know the branch types while analyzing branch
>>> data. For example, a call is very different from a conditional branch.
>>>
...
>>> To keep consistent on kernel and userspace and make the classification
>>> more common, the patch adds the common branch type classification
>>> in perf_event.h.
>>
>> Most of the code and doc uses "branch" but then a few these are called
>> "jump". Can we just stick with "branch"?
>>
>>> PERF_BR_NONE  : unknown
>>> PERF_BR_JCC   : conditional jump
>>> PERF_BR_JMP   : jump
>>> PERF_BR_IND_JMP   : indirect jump
>> eg:
>>
>> PERF_BR_COND: conditional branch
>> PERF_BR_UNCOND  : unconditional branch
>> PERF_BR_IND : indirect branch
>
> Call and jump are all branches. If we want to figure out which one is 
> jump and which one is call, we need the detail branch type definitions.

Yeah I'm not saying we don't need the different types, I'm saying I'd
rather we just called them "branch" not "jump". Just because "jump" can
mean different things on different arches.

> For example,  if we only say "PERF_BR_IND", we could not know if it's an 
> indirect jump or indirect call.

Yes we can, PERF_BR_IND is an indirect branch, which is not a call,
because if it was a call then it would be PERF_BR_IND_CALL.

>>> PERF_BR_CALL  : call
>>> PERF_BR_IND_CALL  : indirect call
>>> PERF_BR_RET   : return
>>> PERF_BR_SYSCALL   : syscall
>>> PERF_BR_SYSRET: syscall return
>>> PERF_BR_IRQ   : hw interrupt/trap/fault
>>> PERF_BR_INT   : sw interrupt
>> I'm not sure what that means, I'm guessing on x86 it means someone
>> executed "int" ?
>
> PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

OK, but I still don't know what that means :)

What's an example of an instruction that is PERF_BR_IRQ and PERF_BR_INT ?

> PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call 
> (direct call and indirect call) and return.

Yep makes sense.

> PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.

Yep OK.

>> Is that sufficiently useful to use up a bit? I think we only have 3
>> free?
>
> Do you means 3 bits? Each bit stands for one branch type? I guess what 
> you mean is:
>
> PERF_BR_COND: conditional branch
> PERF_BR_UNCOND  : unconditional branch
> PERF_BR_IND : indirect branch
>
> But 3 branch types are not enough for us.

What I meant was you're using 4 bits for the type, so you have 16
possible values, and you've defined 13 of them. Meaning there are only 3
types free.

So we should try to only define branch types that are really useful, and
keep some free for future use.

Maybe PERF_BR_INT is really common on x86 and so it's important to count
it, but like I said above I don't know what it is.

>>> PERF_BR_IRET  : return from interrupt
>>> PERF_BR_FAR_BRANCH: not generic far branch type
>> What is a "not generic far branch" ?
>>
>> I don't know what that would mean on powerpc for example.
>
> It's reserved for future using I think.

OK so let's not put it in the Linux API until it's defined?

>> I think the only thing we have on powerpc that's commonly used and that
>> isn't covered above is branches that decrement a loop counter and then
>> branch based on the result.
...
>
> Sorry, I'm not familiar with powerpc arch. Or could you add the branch 
> type which powerpc needs?

These are good:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */

These we wouldn't use currently, but make sense:

+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRET= 11,   /* return from interrupt */

These I'm not so sure about, I don't really know what they would map to
for us:

+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */

And sounds like this should be dropped for now:

+   PERF_BR_FAR_BRANCH  = 12,   /* not generic far branch type */

The branch types you haven't covered which might be useful for us are:

PERF_BR_COND_CALL   /* Conditional call */
PERF_BR_COND_RET/* Condition return */


cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
"Jin, Yao"  writes:
> On 7/10/2017 2:05 PM, Michael Ellerman wrote:
>> Jin Yao  writes:
>>
>>> It is often useful to know the branch types while analyzing branch
>>> data. For example, a call is very different from a conditional branch.
>>>
...
>>> To keep consistent on kernel and userspace and make the classification
>>> more common, the patch adds the common branch type classification
>>> in perf_event.h.
>>
>> Most of the code and doc uses "branch" but then a few these are called
>> "jump". Can we just stick with "branch"?
>>
>>> PERF_BR_NONE  : unknown
>>> PERF_BR_JCC   : conditional jump
>>> PERF_BR_JMP   : jump
>>> PERF_BR_IND_JMP   : indirect jump
>> eg:
>>
>> PERF_BR_COND: conditional branch
>> PERF_BR_UNCOND  : unconditional branch
>> PERF_BR_IND : indirect branch
>
> Call and jump are all branches. If we want to figure out which one is 
> jump and which one is call, we need the detail branch type definitions.

Yeah I'm not saying we don't need the different types, I'm saying I'd
rather we just called them "branch" not "jump". Just because "jump" can
mean different things on different arches.

> For example,  if we only say "PERF_BR_IND", we could not know if it's an 
> indirect jump or indirect call.

Yes we can, PERF_BR_IND is an indirect branch, which is not a call,
because if it was a call then it would be PERF_BR_IND_CALL.

>>> PERF_BR_CALL  : call
>>> PERF_BR_IND_CALL  : indirect call
>>> PERF_BR_RET   : return
>>> PERF_BR_SYSCALL   : syscall
>>> PERF_BR_SYSRET: syscall return
>>> PERF_BR_IRQ   : hw interrupt/trap/fault
>>> PERF_BR_INT   : sw interrupt
>> I'm not sure what that means, I'm guessing on x86 it means someone
>> executed "int" ?
>
> PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

OK, but I still don't know what that means :)

What's an example of an instruction that is PERF_BR_IRQ and PERF_BR_INT ?

> PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call 
> (direct call and indirect call) and return.

Yep makes sense.

> PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.

Yep OK.

>> Is that sufficiently useful to use up a bit? I think we only have 3
>> free?
>
> Do you means 3 bits? Each bit stands for one branch type? I guess what 
> you mean is:
>
> PERF_BR_COND: conditional branch
> PERF_BR_UNCOND  : unconditional branch
> PERF_BR_IND : indirect branch
>
> But 3 branch types are not enough for us.

What I meant was you're using 4 bits for the type, so you have 16
possible values, and you've defined 13 of them. Meaning there are only 3
types free.

So we should try to only define branch types that are really useful, and
keep some free for future use.

Maybe PERF_BR_INT is really common on x86 and so it's important to count
it, but like I said above I don't know what it is.

>>> PERF_BR_IRET  : return from interrupt
>>> PERF_BR_FAR_BRANCH: not generic far branch type
>> What is a "not generic far branch" ?
>>
>> I don't know what that would mean on powerpc for example.
>
> It's reserved for future using I think.

OK so let's not put it in the Linux API until it's defined?

>> I think the only thing we have on powerpc that's commonly used and that
>> isn't covered above is branches that decrement a loop counter and then
>> branch based on the result.
...
>
> Sorry, I'm not familiar with powerpc arch. Or could you add the branch 
> type which powerpc needs?

These are good:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */

These we wouldn't use currently, but make sense:

+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRET= 11,   /* return from interrupt */

These I'm not so sure about, I don't really know what they would map to
for us:

+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */

And sounds like this should be dropped for now:

+   PERF_BR_FAR_BRANCH  = 12,   /* not generic far branch type */

The branch types you haven't covered which might be useful for us are:

PERF_BR_COND_CALL   /* Conditional call */
PERF_BR_COND_RET/* Condition return */


cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao



On 7/10/2017 2:05 PM, Michael Ellerman wrote:

Hi Jin Yao,

Sorry I haven't commented until now, but it got lost in the flood of
patches.


Never mind, it's no problem. :)


Just a few nit-picks below ...
Jin Yao  writes:


It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.

Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.

Perf already has support for disassembling the branch instruction
to get the x86 branch type.

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

Most of the code and doc uses "branch" but then a few these are called
"jump". Can we just stick with "branch"?


PERF_BR_NONE  : unknown
PERF_BR_JCC   : conditional jump
PERF_BR_JMP   : jump
PERF_BR_IND_JMP   : indirect jump

eg:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch


Call and jump are all branches. If we want to figure out which one is 
jump and which one is call, we need the detail branch type definitions.


For example,  if we only say "PERF_BR_IND", we could not know if it's an 
indirect jump or indirect call.

PERF_BR_CALL  : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET   : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET: syscall return
PERF_BR_IRQ   : hw interrupt/trap/fault
PERF_BR_INT   : sw interrupt

I'm not sure what that means, I'm guessing on x86 it means someone
executed "int" ?


PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call 
(direct call and indirect call) and return.


PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.


Is that sufficiently useful to use up a bit? I think we only have 3
free?


Do you means 3 bits? Each bit stands for one branch type? I guess what 
you mean is:


PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

But 3 branch types are not enough for us.


PERF_BR_IRET  : return from interrupt
PERF_BR_FAR_BRANCH: not generic far branch type

What is a "not generic far branch" ?

I don't know what that would mean on powerpc for example.


It's reserved for future using I think.



I think the only thing we have on powerpc that's commonly used and that
isn't covered above is branches that decrement a loop counter and then
branch based on the result.

It might be nice if we could separate those out from other conditional
branches. Whether it's worth using a bit for I'm not sure. Do other
arches have something similar?

Those branches do tend to be "backward conditional", so that may be
sufficient. But backward conditional also includes if bodies that have
been moved out of line and then branch back to the main body of the
function.

cheers


Sorry, I'm not familiar with powerpc arch. Or could you add the branch 
type which powerpc needs?


For backward conditional and forward conditional, we compute them in 
userspace according to the from/to addresses.


Thanks
Jin Yao





Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao



On 7/10/2017 2:05 PM, Michael Ellerman wrote:

Hi Jin Yao,

Sorry I haven't commented until now, but it got lost in the flood of
patches.


Never mind, it's no problem. :)


Just a few nit-picks below ...
Jin Yao  writes:


It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.

Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.

Perf already has support for disassembling the branch instruction
to get the x86 branch type.

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

Most of the code and doc uses "branch" but then a few these are called
"jump". Can we just stick with "branch"?


PERF_BR_NONE  : unknown
PERF_BR_JCC   : conditional jump
PERF_BR_JMP   : jump
PERF_BR_IND_JMP   : indirect jump

eg:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch


Call and jump are all branches. If we want to figure out which one is 
jump and which one is call, we need the detail branch type definitions.


For example,  if we only say "PERF_BR_IND", we could not know if it's an 
indirect jump or indirect call.

PERF_BR_CALL  : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET   : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET: syscall return
PERF_BR_IRQ   : hw interrupt/trap/fault
PERF_BR_INT   : sw interrupt

I'm not sure what that means, I'm guessing on x86 it means someone
executed "int" ?


PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call 
(direct call and indirect call) and return.


PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.


Is that sufficiently useful to use up a bit? I think we only have 3
free?


Do you means 3 bits? Each bit stands for one branch type? I guess what 
you mean is:


PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

But 3 branch types are not enough for us.


PERF_BR_IRET  : return from interrupt
PERF_BR_FAR_BRANCH: not generic far branch type

What is a "not generic far branch" ?

I don't know what that would mean on powerpc for example.


It's reserved for future using I think.



I think the only thing we have on powerpc that's commonly used and that
isn't covered above is branches that decrement a loop counter and then
branch based on the result.

It might be nice if we could separate those out from other conditional
branches. Whether it's worth using a bit for I'm not sure. Do other
arches have something similar?

Those branches do tend to be "backward conditional", so that may be
sufficient. But backward conditional also includes if bodies that have
been moved out of line and then branch back to the main body of the
function.

cheers


Sorry, I'm not familiar with powerpc arch. Or could you add the branch 
type which powerpc needs?


For backward conditional and forward conditional, we compute them in 
userspace according to the from/to addresses.


Thanks
Jin Yao





Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
Hi Jin Yao,

Sorry I haven't commented until now, but it got lost in the flood of
patches.

Just a few nit-picks below ...

Jin Yao  writes:

> It is often useful to know the branch types while analyzing branch
> data. For example, a call is very different from a conditional branch.
>
> Currently we have to look it up in binary while the binary may later
> not be available and even the binary is available but user has to take
> some time. It is very useful for user to check it directly in perf
> report.
>
> Perf already has support for disassembling the branch instruction
> to get the x86 branch type.
>
> To keep consistent on kernel and userspace and make the classification
> more common, the patch adds the common branch type classification
> in perf_event.h.

Most of the code and doc uses "branch" but then a few these are called
"jump". Can we just stick with "branch"?

> PERF_BR_NONE  : unknown
> PERF_BR_JCC   : conditional jump
> PERF_BR_JMP   : jump
> PERF_BR_IND_JMP   : indirect jump

eg:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

> PERF_BR_CALL  : call
> PERF_BR_IND_CALL  : indirect call
> PERF_BR_RET   : return
> PERF_BR_SYSCALL   : syscall
> PERF_BR_SYSRET: syscall return
> PERF_BR_IRQ   : hw interrupt/trap/fault
> PERF_BR_INT   : sw interrupt

I'm not sure what that means, I'm guessing on x86 it means someone
executed "int" ?

Is that sufficiently useful to use up a bit? I think we only have 3
free?

> PERF_BR_IRET  : return from interrupt
> PERF_BR_FAR_BRANCH: not generic far branch type

What is a "not generic far branch" ?

I don't know what that would mean on powerpc for example.


I think the only thing we have on powerpc that's commonly used and that
isn't covered above is branches that decrement a loop counter and then
branch based on the result.

It might be nice if we could separate those out from other conditional
branches. Whether it's worth using a bit for I'm not sure. Do other
arches have something similar?

Those branches do tend to be "backward conditional", so that may be
sufficient. But backward conditional also includes if bodies that have
been moved out of line and then branch back to the main body of the
function.

cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
Hi Jin Yao,

Sorry I haven't commented until now, but it got lost in the flood of
patches.

Just a few nit-picks below ...

Jin Yao  writes:

> It is often useful to know the branch types while analyzing branch
> data. For example, a call is very different from a conditional branch.
>
> Currently we have to look it up in binary while the binary may later
> not be available and even the binary is available but user has to take
> some time. It is very useful for user to check it directly in perf
> report.
>
> Perf already has support for disassembling the branch instruction
> to get the x86 branch type.
>
> To keep consistent on kernel and userspace and make the classification
> more common, the patch adds the common branch type classification
> in perf_event.h.

Most of the code and doc uses "branch" but then a few these are called
"jump". Can we just stick with "branch"?

> PERF_BR_NONE  : unknown
> PERF_BR_JCC   : conditional jump
> PERF_BR_JMP   : jump
> PERF_BR_IND_JMP   : indirect jump

eg:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

> PERF_BR_CALL  : call
> PERF_BR_IND_CALL  : indirect call
> PERF_BR_RET   : return
> PERF_BR_SYSCALL   : syscall
> PERF_BR_SYSRET: syscall return
> PERF_BR_IRQ   : hw interrupt/trap/fault
> PERF_BR_INT   : sw interrupt

I'm not sure what that means, I'm guessing on x86 it means someone
executed "int" ?

Is that sufficiently useful to use up a bit? I think we only have 3
free?

> PERF_BR_IRET  : return from interrupt
> PERF_BR_FAR_BRANCH: not generic far branch type

What is a "not generic far branch" ?

I don't know what that would mean on powerpc for example.


I think the only thing we have on powerpc that's commonly used and that
isn't covered above is branches that decrement a loop counter and then
branch based on the result.

It might be nice if we could separate those out from other conditional
branches. Whether it's worth using a bit for I'm not sure. Do other
arches have something similar?

Those branches do tend to be "backward conditional", so that may be
sufficient. But backward conditional also includes if bodies that have
been moved out of line and then branch back to the main body of the
function.

cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-09 Thread Michael Ellerman
Peter Zijlstra  writes:

> PPC folks, maddy, does this work for you guys?

It think it works for us, but I have some comments, I'll reply to the original.

cheers

> On Thu, Apr 20, 2017 at 08:07:49PM +0800, Jin Yao wrote:
>> It is often useful to know the branch types while analyzing branch
>> data. For example, a call is very different from a conditional branch.
>> 
>> Currently we have to look it up in binary while the binary may later
>> not be available and even the binary is available but user has to take
>> some time. It is very useful for user to check it directly in perf
>> report.
>> 
>> Perf already has support for disassembling the branch instruction
>> to get the x86 branch type.
>> 
>> To keep consistent on kernel and userspace and make the classification
>> more common, the patch adds the common branch type classification
>> in perf_event.h.
>> 
>> PERF_BR_NONE  : unknown
>> PERF_BR_JCC   : conditional jump
>> PERF_BR_JMP   : jump
>> PERF_BR_IND_JMP   : indirect jump
>> PERF_BR_CALL  : call
>> PERF_BR_IND_CALL  : indirect call
>> PERF_BR_RET   : return
>> PERF_BR_SYSCALL   : syscall
>> PERF_BR_SYSRET: syscall return
>> PERF_BR_IRQ   : hw interrupt/trap/fault
>> PERF_BR_INT   : sw interrupt
>> PERF_BR_IRET  : return from interrupt
>> PERF_BR_FAR_BRANCH: not generic far branch type
>> 
>> The patch also adds a new field type (4 bits) in perf_branch_entry
>> to record the branch type.
>> 
>> Since the disassembling of branch instruction needs some overhead,
>> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
>> needs to disassemble the branch instruction and record the branch
>> type.
>> 
>> Change log
>> --
>> 
>> v6: Not changed.
>> 
>> v5: Not changed. The v5 patch series just change the userspace.
>> 
>> v4: Comparing to previous version, the major changes are:
>> 
>> 1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
>>computed later in userspace.
>> 
>> 2. Remove the "cross" field in perf_branch_entry. The cross page
>>computing will be done later in userspace.
>> 
>> Signed-off-by: Jin Yao 
>> ---
>>  include/uapi/linux/perf_event.h   | 29 -
>>  tools/include/uapi/linux/perf_event.h | 29 -
>>  2 files changed, 56 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/uapi/linux/perf_event.h 
>> b/include/uapi/linux/perf_event.h
>> index d09a9cd..69af012 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>>  PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT   = 14, /* no flags */
>>  PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT  = 15, /* no cycles */
>>  
>> +PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT  = 16, /* save branch type */
>> +
>>  PERF_SAMPLE_BRANCH_MAX_SHIFT/* non-ABI */
>>  };
>>  
>> @@ -198,9 +200,32 @@ enum perf_branch_sample_type {
>>  PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << 
>> PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>>  PERF_SAMPLE_BRANCH_NO_CYCLES= 1U << 
>> PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>>  
>> +PERF_SAMPLE_BRANCH_TYPE_SAVE=
>> +1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>> +
>>  PERF_SAMPLE_BRANCH_MAX  = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>>  };
>>  
>> +/*
>> + * Common flow change classification
>> + */
>> +enum {
>> +PERF_BR_NONE= 0,/* unknown */
>> +PERF_BR_JCC = 1,/* conditional jump */
>> +PERF_BR_JMP = 2,/* jump */
>> +PERF_BR_IND_JMP = 3,/* indirect jump */
>> +PERF_BR_CALL= 4,/* call */
>> +PERF_BR_IND_CALL= 5,/* indirect call */
>> +PERF_BR_RET = 6,/* return */
>> +PERF_BR_SYSCALL = 7,/* syscall */
>> +PERF_BR_SYSRET  = 8,/* syscall return */
>> +PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
>> +PERF_BR_INT = 10,   /* sw interrupt */
>> +PERF_BR_IRET= 11,   /* return from interrupt */
>> +PERF_BR_FAR_BRANCH  = 12,   /* not generic far branch type */
>> +PERF_BR_MAX,
>> +};
>> +
>>  #define PERF_SAMPLE_BRANCH_PLM_ALL \
>>  (PERF_SAMPLE_BRANCH_USER|\
>>   PERF_SAMPLE_BRANCH_KERNEL|\
>> @@ -999,6 +1024,7 @@ union perf_mem_data_src {
>>   * in_tx: running in a hardware transaction
>>   * abort: aborting a hardware transaction
>>   *cycles: cycles from last branch (or 0 if not supported)
>> + *  type: branch type
>>   */
>>  struct perf_branch_entry {
>>  __u64   from;
>> @@ -1008,7 +1034,8 @@ struct perf_branch_entry {
>>  in_tx:1,/* in transaction */
>>  abort:1,/* transaction abort */
>>  cycles:16,  /* cycle count to last branch */
>> -reserved:44;
>> +type:4, /* branch type */
>> +reserved:40;

Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-09 Thread Michael Ellerman
Peter Zijlstra  writes:

> PPC folks, maddy, does this work for you guys?

It think it works for us, but I have some comments, I'll reply to the original.

cheers

> On Thu, Apr 20, 2017 at 08:07:49PM +0800, Jin Yao wrote:
>> It is often useful to know the branch types while analyzing branch
>> data. For example, a call is very different from a conditional branch.
>> 
>> Currently we have to look it up in binary while the binary may later
>> not be available and even the binary is available but user has to take
>> some time. It is very useful for user to check it directly in perf
>> report.
>> 
>> Perf already has support for disassembling the branch instruction
>> to get the x86 branch type.
>> 
>> To keep consistent on kernel and userspace and make the classification
>> more common, the patch adds the common branch type classification
>> in perf_event.h.
>> 
>> PERF_BR_NONE  : unknown
>> PERF_BR_JCC   : conditional jump
>> PERF_BR_JMP   : jump
>> PERF_BR_IND_JMP   : indirect jump
>> PERF_BR_CALL  : call
>> PERF_BR_IND_CALL  : indirect call
>> PERF_BR_RET   : return
>> PERF_BR_SYSCALL   : syscall
>> PERF_BR_SYSRET: syscall return
>> PERF_BR_IRQ   : hw interrupt/trap/fault
>> PERF_BR_INT   : sw interrupt
>> PERF_BR_IRET  : return from interrupt
>> PERF_BR_FAR_BRANCH: not generic far branch type
>> 
>> The patch also adds a new field type (4 bits) in perf_branch_entry
>> to record the branch type.
>> 
>> Since the disassembling of branch instruction needs some overhead,
>> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
>> needs to disassemble the branch instruction and record the branch
>> type.
>> 
>> Change log
>> --
>> 
>> v6: Not changed.
>> 
>> v5: Not changed. The v5 patch series just change the userspace.
>> 
>> v4: Comparing to previous version, the major changes are:
>> 
>> 1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
>>computed later in userspace.
>> 
>> 2. Remove the "cross" field in perf_branch_entry. The cross page
>>computing will be done later in userspace.
>> 
>> Signed-off-by: Jin Yao 
>> ---
>>  include/uapi/linux/perf_event.h   | 29 -
>>  tools/include/uapi/linux/perf_event.h | 29 -
>>  2 files changed, 56 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/uapi/linux/perf_event.h 
>> b/include/uapi/linux/perf_event.h
>> index d09a9cd..69af012 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>>  PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT   = 14, /* no flags */
>>  PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT  = 15, /* no cycles */
>>  
>> +PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT  = 16, /* save branch type */
>> +
>>  PERF_SAMPLE_BRANCH_MAX_SHIFT/* non-ABI */
>>  };
>>  
>> @@ -198,9 +200,32 @@ enum perf_branch_sample_type {
>>  PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << 
>> PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>>  PERF_SAMPLE_BRANCH_NO_CYCLES= 1U << 
>> PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>>  
>> +PERF_SAMPLE_BRANCH_TYPE_SAVE=
>> +1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>> +
>>  PERF_SAMPLE_BRANCH_MAX  = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>>  };
>>  
>> +/*
>> + * Common flow change classification
>> + */
>> +enum {
>> +PERF_BR_NONE= 0,/* unknown */
>> +PERF_BR_JCC = 1,/* conditional jump */
>> +PERF_BR_JMP = 2,/* jump */
>> +PERF_BR_IND_JMP = 3,/* indirect jump */
>> +PERF_BR_CALL= 4,/* call */
>> +PERF_BR_IND_CALL= 5,/* indirect call */
>> +PERF_BR_RET = 6,/* return */
>> +PERF_BR_SYSCALL = 7,/* syscall */
>> +PERF_BR_SYSRET  = 8,/* syscall return */
>> +PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
>> +PERF_BR_INT = 10,   /* sw interrupt */
>> +PERF_BR_IRET= 11,   /* return from interrupt */
>> +PERF_BR_FAR_BRANCH  = 12,   /* not generic far branch type */
>> +PERF_BR_MAX,
>> +};
>> +
>>  #define PERF_SAMPLE_BRANCH_PLM_ALL \
>>  (PERF_SAMPLE_BRANCH_USER|\
>>   PERF_SAMPLE_BRANCH_KERNEL|\
>> @@ -999,6 +1024,7 @@ union perf_mem_data_src {
>>   * in_tx: running in a hardware transaction
>>   * abort: aborting a hardware transaction
>>   *cycles: cycles from last branch (or 0 if not supported)
>> + *  type: branch type
>>   */
>>  struct perf_branch_entry {
>>  __u64   from;
>> @@ -1008,7 +1034,8 @@ struct perf_branch_entry {
>>  in_tx:1,/* in transaction */
>>  abort:1,/* transaction abort */
>>  cycles:16,  /* cycle count to last branch */
>> -reserved:44;
>> +type:4, /* branch type */
>> +reserved:40;
>>  };
>>  
>>  #endif /* 

Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-07 Thread Peter Zijlstra

PPC folks, maddy, does this work for you guys?

On Thu, Apr 20, 2017 at 08:07:49PM +0800, Jin Yao wrote:
> It is often useful to know the branch types while analyzing branch
> data. For example, a call is very different from a conditional branch.
> 
> Currently we have to look it up in binary while the binary may later
> not be available and even the binary is available but user has to take
> some time. It is very useful for user to check it directly in perf
> report.
> 
> Perf already has support for disassembling the branch instruction
> to get the x86 branch type.
> 
> To keep consistent on kernel and userspace and make the classification
> more common, the patch adds the common branch type classification
> in perf_event.h.
> 
> PERF_BR_NONE  : unknown
> PERF_BR_JCC   : conditional jump
> PERF_BR_JMP   : jump
> PERF_BR_IND_JMP   : indirect jump
> PERF_BR_CALL  : call
> PERF_BR_IND_CALL  : indirect call
> PERF_BR_RET   : return
> PERF_BR_SYSCALL   : syscall
> PERF_BR_SYSRET: syscall return
> PERF_BR_IRQ   : hw interrupt/trap/fault
> PERF_BR_INT   : sw interrupt
> PERF_BR_IRET  : return from interrupt
> PERF_BR_FAR_BRANCH: not generic far branch type
> 
> The patch also adds a new field type (4 bits) in perf_branch_entry
> to record the branch type.
> 
> Since the disassembling of branch instruction needs some overhead,
> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> needs to disassemble the branch instruction and record the branch
> type.
> 
> Change log
> --
> 
> v6: Not changed.
> 
> v5: Not changed. The v5 patch series just change the userspace.
> 
> v4: Comparing to previous version, the major changes are:
> 
> 1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
>computed later in userspace.
> 
> 2. Remove the "cross" field in perf_branch_entry. The cross page
>computing will be done later in userspace.
> 
> Signed-off-by: Jin Yao 
> ---
>  include/uapi/linux/perf_event.h   | 29 -
>  tools/include/uapi/linux/perf_event.h | 29 -
>  2 files changed, 56 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index d09a9cd..69af012 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>   PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT   = 14, /* no flags */
>   PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT  = 15, /* no cycles */
>  
> + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT  = 16, /* save branch type */
> +
>   PERF_SAMPLE_BRANCH_MAX_SHIFT/* non-ABI */
>  };
>  
> @@ -198,9 +200,32 @@ enum perf_branch_sample_type {
>   PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << 
> PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>   PERF_SAMPLE_BRANCH_NO_CYCLES= 1U << 
> PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>  
> + PERF_SAMPLE_BRANCH_TYPE_SAVE=
> + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
> +
>   PERF_SAMPLE_BRANCH_MAX  = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>  };
>  
> +/*
> + * Common flow change classification
> + */
> +enum {
> + PERF_BR_NONE= 0,/* unknown */
> + PERF_BR_JCC = 1,/* conditional jump */
> + PERF_BR_JMP = 2,/* jump */
> + PERF_BR_IND_JMP = 3,/* indirect jump */
> + PERF_BR_CALL= 4,/* call */
> + PERF_BR_IND_CALL= 5,/* indirect call */
> + PERF_BR_RET = 6,/* return */
> + PERF_BR_SYSCALL = 7,/* syscall */
> + PERF_BR_SYSRET  = 8,/* syscall return */
> + PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
> + PERF_BR_INT = 10,   /* sw interrupt */
> + PERF_BR_IRET= 11,   /* return from interrupt */
> + PERF_BR_FAR_BRANCH  = 12,   /* not generic far branch type */
> + PERF_BR_MAX,
> +};
> +
>  #define PERF_SAMPLE_BRANCH_PLM_ALL \
>   (PERF_SAMPLE_BRANCH_USER|\
>PERF_SAMPLE_BRANCH_KERNEL|\
> @@ -999,6 +1024,7 @@ union perf_mem_data_src {
>   * in_tx: running in a hardware transaction
>   * abort: aborting a hardware transaction
>   *cycles: cycles from last branch (or 0 if not supported)
> + *  type: branch type
>   */
>  struct perf_branch_entry {
>   __u64   from;
> @@ -1008,7 +1034,8 @@ struct perf_branch_entry {
>   in_tx:1,/* in transaction */
>   abort:1,/* transaction abort */
>   cycles:16,  /* cycle count to last branch */
> - reserved:44;
> + type:4, /* branch type */
> + reserved:40;
>  };
>  
>  #endif /* _UAPI_LINUX_PERF_EVENT_H */
> diff --git a/tools/include/uapi/linux/perf_event.h 
> b/tools/include/uapi/linux/perf_event.h
> index d09a9cd..69af012 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ 

Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-07 Thread Peter Zijlstra

PPC folks, maddy, does this work for you guys?

On Thu, Apr 20, 2017 at 08:07:49PM +0800, Jin Yao wrote:
> It is often useful to know the branch types while analyzing branch
> data. For example, a call is very different from a conditional branch.
> 
> Currently we have to look it up in binary while the binary may later
> not be available and even the binary is available but user has to take
> some time. It is very useful for user to check it directly in perf
> report.
> 
> Perf already has support for disassembling the branch instruction
> to get the x86 branch type.
> 
> To keep consistent on kernel and userspace and make the classification
> more common, the patch adds the common branch type classification
> in perf_event.h.
> 
> PERF_BR_NONE  : unknown
> PERF_BR_JCC   : conditional jump
> PERF_BR_JMP   : jump
> PERF_BR_IND_JMP   : indirect jump
> PERF_BR_CALL  : call
> PERF_BR_IND_CALL  : indirect call
> PERF_BR_RET   : return
> PERF_BR_SYSCALL   : syscall
> PERF_BR_SYSRET: syscall return
> PERF_BR_IRQ   : hw interrupt/trap/fault
> PERF_BR_INT   : sw interrupt
> PERF_BR_IRET  : return from interrupt
> PERF_BR_FAR_BRANCH: not generic far branch type
> 
> The patch also adds a new field type (4 bits) in perf_branch_entry
> to record the branch type.
> 
> Since the disassembling of branch instruction needs some overhead,
> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> needs to disassemble the branch instruction and record the branch
> type.
> 
> Change log
> --
> 
> v6: Not changed.
> 
> v5: Not changed. The v5 patch series just change the userspace.
> 
> v4: Comparing to previous version, the major changes are:
> 
> 1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
>computed later in userspace.
> 
> 2. Remove the "cross" field in perf_branch_entry. The cross page
>computing will be done later in userspace.
> 
> Signed-off-by: Jin Yao 
> ---
>  include/uapi/linux/perf_event.h   | 29 -
>  tools/include/uapi/linux/perf_event.h | 29 -
>  2 files changed, 56 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index d09a9cd..69af012 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>   PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT   = 14, /* no flags */
>   PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT  = 15, /* no cycles */
>  
> + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT  = 16, /* save branch type */
> +
>   PERF_SAMPLE_BRANCH_MAX_SHIFT/* non-ABI */
>  };
>  
> @@ -198,9 +200,32 @@ enum perf_branch_sample_type {
>   PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << 
> PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>   PERF_SAMPLE_BRANCH_NO_CYCLES= 1U << 
> PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>  
> + PERF_SAMPLE_BRANCH_TYPE_SAVE=
> + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
> +
>   PERF_SAMPLE_BRANCH_MAX  = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>  };
>  
> +/*
> + * Common flow change classification
> + */
> +enum {
> + PERF_BR_NONE= 0,/* unknown */
> + PERF_BR_JCC = 1,/* conditional jump */
> + PERF_BR_JMP = 2,/* jump */
> + PERF_BR_IND_JMP = 3,/* indirect jump */
> + PERF_BR_CALL= 4,/* call */
> + PERF_BR_IND_CALL= 5,/* indirect call */
> + PERF_BR_RET = 6,/* return */
> + PERF_BR_SYSCALL = 7,/* syscall */
> + PERF_BR_SYSRET  = 8,/* syscall return */
> + PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
> + PERF_BR_INT = 10,   /* sw interrupt */
> + PERF_BR_IRET= 11,   /* return from interrupt */
> + PERF_BR_FAR_BRANCH  = 12,   /* not generic far branch type */
> + PERF_BR_MAX,
> +};
> +
>  #define PERF_SAMPLE_BRANCH_PLM_ALL \
>   (PERF_SAMPLE_BRANCH_USER|\
>PERF_SAMPLE_BRANCH_KERNEL|\
> @@ -999,6 +1024,7 @@ union perf_mem_data_src {
>   * in_tx: running in a hardware transaction
>   * abort: aborting a hardware transaction
>   *cycles: cycles from last branch (or 0 if not supported)
> + *  type: branch type
>   */
>  struct perf_branch_entry {
>   __u64   from;
> @@ -1008,7 +1034,8 @@ struct perf_branch_entry {
>   in_tx:1,/* in transaction */
>   abort:1,/* transaction abort */
>   cycles:16,  /* cycle count to last branch */
> - reserved:44;
> + type:4, /* branch type */
> + reserved:40;
>  };
>  
>  #endif /* _UAPI_LINUX_PERF_EVENT_H */
> diff --git a/tools/include/uapi/linux/perf_event.h 
> b/tools/include/uapi/linux/perf_event.h
> index d09a9cd..69af012 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ 

[PATCH v6 1/7] perf/core: Define the common branch type classification

2017-04-19 Thread Jin Yao
It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.

Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.

Perf already has support for disassembling the branch instruction
to get the x86 branch type.

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

PERF_BR_NONE  : unknown
PERF_BR_JCC   : conditional jump
PERF_BR_JMP   : jump
PERF_BR_IND_JMP   : indirect jump
PERF_BR_CALL  : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET   : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET: syscall return
PERF_BR_IRQ   : hw interrupt/trap/fault
PERF_BR_INT   : sw interrupt
PERF_BR_IRET  : return from interrupt
PERF_BR_FAR_BRANCH: not generic far branch type

The patch also adds a new field type (4 bits) in perf_branch_entry
to record the branch type.

Since the disassembling of branch instruction needs some overhead,
a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
needs to disassemble the branch instruction and record the branch
type.

Change log
--

v6: Not changed.

v5: Not changed. The v5 patch series just change the userspace.

v4: Comparing to previous version, the major changes are:

1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
   computed later in userspace.

2. Remove the "cross" field in perf_branch_entry. The cross page
   computing will be done later in userspace.

Signed-off-by: Jin Yao 
---
 include/uapi/linux/perf_event.h   | 29 -
 tools/include/uapi/linux/perf_event.h | 29 -
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d09a9cd..69af012 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT   = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT  = 15, /* no cycles */
 
+   PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT  = 16, /* save branch type */
+
PERF_SAMPLE_BRANCH_MAX_SHIFT/* non-ABI */
 };
 
@@ -198,9 +200,32 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << 
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES= 1U << 
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
 
+   PERF_SAMPLE_BRANCH_TYPE_SAVE=
+   1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
PERF_SAMPLE_BRANCH_MAX  = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
 
+/*
+ * Common flow change classification
+ */
+enum {
+   PERF_BR_NONE= 0,/* unknown */
+   PERF_BR_JCC = 1,/* conditional jump */
+   PERF_BR_JMP = 2,/* jump */
+   PERF_BR_IND_JMP = 3,/* indirect jump */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */
+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */
+   PERF_BR_IRET= 11,   /* return from interrupt */
+   PERF_BR_FAR_BRANCH  = 12,   /* not generic far branch type */
+   PERF_BR_MAX,
+};
+
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
 PERF_SAMPLE_BRANCH_KERNEL|\
@@ -999,6 +1024,7 @@ union perf_mem_data_src {
  * in_tx: running in a hardware transaction
  * abort: aborting a hardware transaction
  *cycles: cycles from last branch (or 0 if not supported)
+ *  type: branch type
  */
 struct perf_branch_entry {
__u64   from;
@@ -1008,7 +1034,8 @@ struct perf_branch_entry {
in_tx:1,/* in transaction */
abort:1,/* transaction abort */
cycles:16,  /* cycle count to last branch */
-   reserved:44;
+   type:4, /* branch type */
+   reserved:40;
 };
 
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index d09a9cd..69af012 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT   = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT  = 15, /* no cycles */
 
+   PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT  = 16, /* save 

[PATCH v6 1/7] perf/core: Define the common branch type classification

2017-04-19 Thread Jin Yao
It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.

Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.

Perf already has support for disassembling the branch instruction
to get the x86 branch type.

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

PERF_BR_NONE  : unknown
PERF_BR_JCC   : conditional jump
PERF_BR_JMP   : jump
PERF_BR_IND_JMP   : indirect jump
PERF_BR_CALL  : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET   : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET: syscall return
PERF_BR_IRQ   : hw interrupt/trap/fault
PERF_BR_INT   : sw interrupt
PERF_BR_IRET  : return from interrupt
PERF_BR_FAR_BRANCH: not generic far branch type

The patch also adds a new field type (4 bits) in perf_branch_entry
to record the branch type.

Since the disassembling of branch instruction needs some overhead,
a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
needs to disassemble the branch instruction and record the branch
type.

Change log
--

v6: Not changed.

v5: Not changed. The v5 patch series just change the userspace.

v4: Comparing to previous version, the major changes are:

1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
   computed later in userspace.

2. Remove the "cross" field in perf_branch_entry. The cross page
   computing will be done later in userspace.

Signed-off-by: Jin Yao 
---
 include/uapi/linux/perf_event.h   | 29 -
 tools/include/uapi/linux/perf_event.h | 29 -
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d09a9cd..69af012 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT   = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT  = 15, /* no cycles */
 
+   PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT  = 16, /* save branch type */
+
PERF_SAMPLE_BRANCH_MAX_SHIFT/* non-ABI */
 };
 
@@ -198,9 +200,32 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << 
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES= 1U << 
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
 
+   PERF_SAMPLE_BRANCH_TYPE_SAVE=
+   1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
PERF_SAMPLE_BRANCH_MAX  = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
 
+/*
+ * Common flow change classification
+ */
+enum {
+   PERF_BR_NONE= 0,/* unknown */
+   PERF_BR_JCC = 1,/* conditional jump */
+   PERF_BR_JMP = 2,/* jump */
+   PERF_BR_IND_JMP = 3,/* indirect jump */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */
+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */
+   PERF_BR_IRET= 11,   /* return from interrupt */
+   PERF_BR_FAR_BRANCH  = 12,   /* not generic far branch type */
+   PERF_BR_MAX,
+};
+
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
 PERF_SAMPLE_BRANCH_KERNEL|\
@@ -999,6 +1024,7 @@ union perf_mem_data_src {
  * in_tx: running in a hardware transaction
  * abort: aborting a hardware transaction
  *cycles: cycles from last branch (or 0 if not supported)
+ *  type: branch type
  */
 struct perf_branch_entry {
__u64   from;
@@ -1008,7 +1034,8 @@ struct perf_branch_entry {
in_tx:1,/* in transaction */
abort:1,/* transaction abort */
cycles:16,  /* cycle count to last branch */
-   reserved:44;
+   type:4, /* branch type */
+   reserved:40;
 };
 
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index d09a9cd..69af012 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT   = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT  = 15, /* no cycles */
 
+   PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT  = 16, /* save branch type */
+