Re: [PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-04-19 Thread Michael Ellerman
Madhavan Srinivasan  writes:
> On Wednesday 19 April 2017 10:20 AM, Michael Ellerman wrote:
>> It also occurred to me that we don't actually have to redefine the whole
>> union, it's only the bitfields that matter, so we could reduce the diff
>> to:
>>
>> diff --git a/include/uapi/linux/perf_event.h 
>> b/include/uapi/linux/perf_event.h
>> index c66a485a24ac..97152c79df6b 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -894,12 +894,23 @@ enum perf_callchain_context {
>>   union perf_mem_data_src {
>>  __u64 val;
>>  struct {
>> +#if defined(__LITTLE_ENDIAN_BITFIELD)
>>  __u64   mem_op:5,   /* type of opcode */
>>  mem_lvl:14, /* memory hierarchy level */
>>  mem_snoop:5,/* snoop mode */
>>  mem_lock:2, /* lock instr */
>>  mem_dtlb:7, /* tlb access */
>>  mem_rsvd:31;
>> +#elif defined(__BIG_ENDIAN_BITFIELD)
>> +__u64   mem_rsvd:31,
>> +mem_dtlb:7, /* tlb access */
>> +mem_lock:2, /* lock instr */
>> +mem_snoop:5,/* snoop mode */
>> +mem_lvl:14, /* memory hierarchy level */
>> +mem_op:5;   /* type of opcode */
>> +#else
>> +#error "Unknown endianness"
>> +#endif
>>  };
>>   };
>>
>>
>> That looks better to me, thoughts?
>
> Yep.  Looks fine to me and also tested the same.

I merged the original version, as that's what Peterz acked and I didn't
want to block the series any longer.

I'll send an incremental patch to do the cleanup.

cheers


Re: [PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-04-19 Thread Madhavan Srinivasan



On Wednesday 19 April 2017 10:20 AM, Michael Ellerman wrote:

Peter Zijlstra  writes:


On Tue, Apr 11, 2017 at 07:21:05AM +0530, Madhavan Srinivasan wrote:

From: Sukadev Bhattiprolu 

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms (Which is broken now), we also need a big-endian represenation
of perf_mem_data_src. i.e, in a big endian system, if user request
PERF_SAMPLE_DATA_SRC (perf report -d), will get the default value from
perf_sample_data_init(), which is PERF_MEM_NA. Value for PERF_MEM_NA
is constructed using shifts:

   /* TLB access */
   #define PERF_MEM_TLB_NA  0x01 /* not available */
   ...
   #define PERF_MEM_TLB_SHIFT   26

   #define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)

   #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
PERF_MEM_S(LVL, NA)   |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA)  |\
PERF_MEM_S(TLB, NA))

Which works out as:

   ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))

Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
in CPU endian.

But then in the perf tool, the code uses the bitfields to inspect the
value, and currently the bitfields are defined using little endian
ordering.

So eg. in perf_mem__tlb_scnprintf() we see:
   data_src->val = 0x5080021
  op = 0x0
 lvl = 0x0
   snoop = 0x0
lock = 0x0
dtlb = 0x0
rsvd = 0x5080021

Patch does a minimal fix of adding big endian definition of the bitfields
to match the values that are already exported by the kernel on big endian.
And it makes no change on little endian.

I think it is important to note that there are no current big-endian
users. So 'fixing' this will not break anybody and will ensure future
users (next patch) will work correctly.

Actually that's only partly true. As I describe above the PERF_MEM_NA
value is currently exported on BE platforms when a user requests it.

So I added this text after the output from perf_mem__tlb_scnprintf():

   Because of the way the perf tool code is written this is still displayed to 
the
   user as "N/A", so there is no bug visible at the UI level.
   
   Currently there are no big endian architectures which export a meaningful

   value (ie. other than PERF_MEM_NA), so the extent of the bug on big endian
   platforms is that the PERF_MEM_NA value is exported incorrectly as described
   above. Subsequent patches will add support on big endian powerpc for 
populating
   the data source value.


Hope that is clear.

It also occurred to me that we don't actually have to redefine the whole
union, it's only the bitfields that matter, so we could reduce the diff
to:

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485a24ac..97152c79df6b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -894,12 +894,23 @@ enum perf_callchain_context {
  union perf_mem_data_src {
__u64 val;
struct {
+#if defined(__LITTLE_ENDIAN_BITFIELD)
__u64   mem_op:5,   /* type of opcode */
mem_lvl:14, /* memory hierarchy level */
mem_snoop:5,/* snoop mode */
mem_lock:2, /* lock instr */
mem_dtlb:7, /* tlb access */
mem_rsvd:31;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+   __u64   mem_rsvd:31,
+   mem_dtlb:7, /* tlb access */
+   mem_lock:2, /* lock instr */
+   mem_snoop:5,/* snoop mode */
+   mem_lvl:14, /* memory hierarchy level */
+   mem_op:5;   /* type of opcode */
+#else
+#error "Unknown endianness"
+#endif
};
  };


That looks better to me, thoughts?


Yep.  Looks fine to me and also tested the same.

Maddy



cheers





Re: [PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-04-18 Thread Michael Ellerman
Peter Zijlstra  writes:

> On Tue, Apr 11, 2017 at 07:21:05AM +0530, Madhavan Srinivasan wrote:
>> From: Sukadev Bhattiprolu 
>> 
>> perf_mem_data_src is an union that is initialized via the ->val field
>> and accessed via the bitmap fields. For this to work on big endian
>> platforms (Which is broken now), we also need a big-endian represenation
>> of perf_mem_data_src. i.e, in a big endian system, if user request
>> PERF_SAMPLE_DATA_SRC (perf report -d), will get the default value from
>> perf_sample_data_init(), which is PERF_MEM_NA. Value for PERF_MEM_NA
>> is constructed using shifts:
>> 
>>   /* TLB access */
>>   #define PERF_MEM_TLB_NA0x01 /* not available */
>>   ...
>>   #define PERF_MEM_TLB_SHIFT 26
>> 
>>   #define PERF_MEM_S(a, s) \
>>  (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
>> 
>>   #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
>>  PERF_MEM_S(LVL, NA)   |\
>>  PERF_MEM_S(SNOOP, NA) |\
>>  PERF_MEM_S(LOCK, NA)  |\
>>  PERF_MEM_S(TLB, NA))
>> 
>> Which works out as:
>> 
>>   ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))
>> 
>> Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
>> in CPU endian.
>> 
>> But then in the perf tool, the code uses the bitfields to inspect the
>> value, and currently the bitfields are defined using little endian
>> ordering.
>> 
>> So eg. in perf_mem__tlb_scnprintf() we see:
>>   data_src->val = 0x5080021
>>  op = 0x0
>> lvl = 0x0
>>   snoop = 0x0
>>lock = 0x0
>>dtlb = 0x0
>>rsvd = 0x5080021
>> 
>> Patch does a minimal fix of adding big endian definition of the bitfields
>> to match the values that are already exported by the kernel on big endian.
>> And it makes no change on little endian.
>
> I think it is important to note that there are no current big-endian
> users. So 'fixing' this will not break anybody and will ensure future
> users (next patch) will work correctly.

Actually that's only partly true. As I describe above the PERF_MEM_NA
value is currently exported on BE platforms when a user requests it.

So I added this text after the output from perf_mem__tlb_scnprintf():

  Because of the way the perf tool code is written this is still displayed to 
the
  user as "N/A", so there is no bug visible at the UI level.
  
  Currently there are no big endian architectures which export a meaningful
  value (ie. other than PERF_MEM_NA), so the extent of the bug on big endian
  platforms is that the PERF_MEM_NA value is exported incorrectly as described
  above. Subsequent patches will add support on big endian powerpc for 
populating
  the data source value.


Hope that is clear.

It also occurred to me that we don't actually have to redefine the whole
union, it's only the bitfields that matter, so we could reduce the diff
to:

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485a24ac..97152c79df6b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -894,12 +894,23 @@ enum perf_callchain_context {
 union perf_mem_data_src {
__u64 val;
struct {
+#if defined(__LITTLE_ENDIAN_BITFIELD)
__u64   mem_op:5,   /* type of opcode */
mem_lvl:14, /* memory hierarchy level */
mem_snoop:5,/* snoop mode */
mem_lock:2, /* lock instr */
mem_dtlb:7, /* tlb access */
mem_rsvd:31;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+   __u64   mem_rsvd:31,
+   mem_dtlb:7, /* tlb access */
+   mem_lock:2, /* lock instr */
+   mem_snoop:5,/* snoop mode */
+   mem_lvl:14, /* memory hierarchy level */
+   mem_op:5;   /* type of opcode */
+#else
+#error "Unknown endianness"
+#endif
};
 };
 

That looks better to me, thoughts?

cheers


Re: [PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-04-16 Thread Madhavan Srinivasan



On Thursday 13 April 2017 06:53 PM, Michael Ellerman wrote:

Peter Zijlstra  writes:


On Tue, Apr 11, 2017 at 07:21:05AM +0530, Madhavan Srinivasan wrote:

From: Sukadev Bhattiprolu 

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms (Which is broken now), we also need a big-endian represenation
of perf_mem_data_src. i.e, in a big endian system, if user request
PERF_SAMPLE_DATA_SRC (perf report -d), will get the default value from
perf_sample_data_init(), which is PERF_MEM_NA. Value for PERF_MEM_NA
is constructed using shifts:

   /* TLB access */
   #define PERF_MEM_TLB_NA  0x01 /* not available */
   ...
   #define PERF_MEM_TLB_SHIFT   26

   #define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)

   #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
PERF_MEM_S(LVL, NA)   |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA)  |\
PERF_MEM_S(TLB, NA))

Which works out as:

   ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))

Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
in CPU endian.

But then in the perf tool, the code uses the bitfields to inspect the
value, and currently the bitfields are defined using little endian
ordering.

So eg. in perf_mem__tlb_scnprintf() we see:
   data_src->val = 0x5080021
  op = 0x0
 lvl = 0x0
   snoop = 0x0
lock = 0x0
dtlb = 0x0
rsvd = 0x5080021

Patch does a minimal fix of adding big endian definition of the bitfields
to match the values that are already exported by the kernel on big endian.
And it makes no change on little endian.

I think it is important to note that there are no current big-endian
users. So 'fixing' this will not break anybody and will ensure future
users (next patch) will work correctly.

Sure I'll fold in something along those lines.


Thanks mpe.

Maddy




Aside from that amendment,

Acked-by: Peter Zijlstra (Intel) 

Thanks.

cheers





Re: [PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-04-16 Thread Madhavan Srinivasan



On Thursday 13 April 2017 06:08 PM, Peter Zijlstra wrote:

On Tue, Apr 11, 2017 at 07:21:05AM +0530, Madhavan Srinivasan wrote:

From: Sukadev Bhattiprolu 

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms (Which is broken now), we also need a big-endian represenation
of perf_mem_data_src. i.e, in a big endian system, if user request
PERF_SAMPLE_DATA_SRC (perf report -d), will get the default value from
perf_sample_data_init(), which is PERF_MEM_NA. Value for PERF_MEM_NA
is constructed using shifts:

   /* TLB access */
   #define PERF_MEM_TLB_NA  0x01 /* not available */
   ...
   #define PERF_MEM_TLB_SHIFT   26

   #define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)

   #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
PERF_MEM_S(LVL, NA)   |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA)  |\
PERF_MEM_S(TLB, NA))

Which works out as:

   ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))

Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
in CPU endian.

But then in the perf tool, the code uses the bitfields to inspect the
value, and currently the bitfields are defined using little endian
ordering.

So eg. in perf_mem__tlb_scnprintf() we see:
   data_src->val = 0x5080021
  op = 0x0
 lvl = 0x0
   snoop = 0x0
lock = 0x0
dtlb = 0x0
rsvd = 0x5080021

Patch does a minimal fix of adding big endian definition of the bitfields
to match the values that are already exported by the kernel on big endian.
And it makes no change on little endian.

I think it is important to note that there are no current big-endian
users. So 'fixing' this will not break anybody and will ensure future
users (next patch) will work correctly.

Aside from that amendment,

Acked-by: Peter Zijlstra (Intel) 


Thanks
Maddy




Re: [PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-04-13 Thread Michael Ellerman
Peter Zijlstra  writes:

> On Tue, Apr 11, 2017 at 07:21:05AM +0530, Madhavan Srinivasan wrote:
>> From: Sukadev Bhattiprolu 
>> 
>> perf_mem_data_src is an union that is initialized via the ->val field
>> and accessed via the bitmap fields. For this to work on big endian
>> platforms (Which is broken now), we also need a big-endian represenation
>> of perf_mem_data_src. i.e, in a big endian system, if user request
>> PERF_SAMPLE_DATA_SRC (perf report -d), will get the default value from
>> perf_sample_data_init(), which is PERF_MEM_NA. Value for PERF_MEM_NA
>> is constructed using shifts:
>> 
>>   /* TLB access */
>>   #define PERF_MEM_TLB_NA0x01 /* not available */
>>   ...
>>   #define PERF_MEM_TLB_SHIFT 26
>> 
>>   #define PERF_MEM_S(a, s) \
>>  (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
>> 
>>   #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
>>  PERF_MEM_S(LVL, NA)   |\
>>  PERF_MEM_S(SNOOP, NA) |\
>>  PERF_MEM_S(LOCK, NA)  |\
>>  PERF_MEM_S(TLB, NA))
>> 
>> Which works out as:
>> 
>>   ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))
>> 
>> Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
>> in CPU endian.
>> 
>> But then in the perf tool, the code uses the bitfields to inspect the
>> value, and currently the bitfields are defined using little endian
>> ordering.
>> 
>> So eg. in perf_mem__tlb_scnprintf() we see:
>>   data_src->val = 0x5080021
>>  op = 0x0
>> lvl = 0x0
>>   snoop = 0x0
>>lock = 0x0
>>dtlb = 0x0
>>rsvd = 0x5080021
>> 
>> Patch does a minimal fix of adding big endian definition of the bitfields
>> to match the values that are already exported by the kernel on big endian.
>> And it makes no change on little endian.
>
> I think it is important to note that there are no current big-endian
> users. So 'fixing' this will not break anybody and will ensure future
> users (next patch) will work correctly.

Sure I'll fold in something along those lines.

> Aside from that amendment,
>
> Acked-by: Peter Zijlstra (Intel) 

Thanks.

cheers


Re: [PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-04-13 Thread Peter Zijlstra
On Tue, Apr 11, 2017 at 07:21:05AM +0530, Madhavan Srinivasan wrote:
> From: Sukadev Bhattiprolu 
> 
> perf_mem_data_src is an union that is initialized via the ->val field
> and accessed via the bitmap fields. For this to work on big endian
> platforms (Which is broken now), we also need a big-endian represenation
> of perf_mem_data_src. i.e, in a big endian system, if user request
> PERF_SAMPLE_DATA_SRC (perf report -d), will get the default value from
> perf_sample_data_init(), which is PERF_MEM_NA. Value for PERF_MEM_NA
> is constructed using shifts:
> 
>   /* TLB access */
>   #define PERF_MEM_TLB_NA 0x01 /* not available */
>   ...
>   #define PERF_MEM_TLB_SHIFT  26
> 
>   #define PERF_MEM_S(a, s) \
>   (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
> 
>   #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
>   PERF_MEM_S(LVL, NA)   |\
>   PERF_MEM_S(SNOOP, NA) |\
>   PERF_MEM_S(LOCK, NA)  |\
>   PERF_MEM_S(TLB, NA))
> 
> Which works out as:
> 
>   ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))
> 
> Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
> in CPU endian.
> 
> But then in the perf tool, the code uses the bitfields to inspect the
> value, and currently the bitfields are defined using little endian
> ordering.
> 
> So eg. in perf_mem__tlb_scnprintf() we see:
>   data_src->val = 0x5080021
>  op = 0x0
> lvl = 0x0
>   snoop = 0x0
>lock = 0x0
>dtlb = 0x0
>rsvd = 0x5080021
> 
> Patch does a minimal fix of adding big endian definition of the bitfields
> to match the values that are already exported by the kernel on big endian.
> And it makes no change on little endian.

I think it is important to note that there are no current big-endian
users. So 'fixing' this will not break anybody and will ensure future
users (next patch) will work correctly.

Aside from that amendment,

Acked-by: Peter Zijlstra (Intel) 


[PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-04-10 Thread Madhavan Srinivasan
From: Sukadev Bhattiprolu 

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms (Which is broken now), we also need a big-endian represenation
of perf_mem_data_src. i.e, in a big endian system, if user request
PERF_SAMPLE_DATA_SRC (perf report -d), will get the default value from
perf_sample_data_init(), which is PERF_MEM_NA. Value for PERF_MEM_NA
is constructed using shifts:

  /* TLB access */
  #define PERF_MEM_TLB_NA   0x01 /* not available */
  ...
  #define PERF_MEM_TLB_SHIFT26

  #define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)

  #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
PERF_MEM_S(LVL, NA)   |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA)  |\
PERF_MEM_S(TLB, NA))

Which works out as:

  ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))

Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
in CPU endian.

But then in the perf tool, the code uses the bitfields to inspect the
value, and currently the bitfields are defined using little endian
ordering.

So eg. in perf_mem__tlb_scnprintf() we see:
  data_src->val = 0x5080021
 op = 0x0
lvl = 0x0
  snoop = 0x0
   lock = 0x0
   dtlb = 0x0
   rsvd = 0x5080021

Patch does a minimal fix of adding big endian definition of the bitfields
to match the values that are already exported by the kernel on big endian.
And it makes no change on little endian.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Madhavan Srinivasan 
---
 include/uapi/linux/perf_event.h   | 16 
 tools/include/uapi/linux/perf_event.h | 16 
 2 files changed, 32 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485a24ac..c4af1159a200 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -891,6 +891,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_PID_CGROUP   (1UL << 2) /* pid=cgroup id, per-cpu 
mode only */
 #define PERF_FLAG_FD_CLOEXEC   (1UL << 3) /* O_CLOEXEC */
 
+#if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
__u64 val;
struct {
@@ -902,6 +903,21 @@ union perf_mem_data_src {
mem_rsvd:31;
};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+   __u64 val;
+   struct {
+   __u64   mem_rsvd:31,
+   mem_dtlb:7, /* tlb access */
+   mem_lock:2, /* lock instr */
+   mem_snoop:5,/* snoop mode */
+   mem_lvl:14, /* memory hierarchy level */
+   mem_op:5;   /* type of opcode */
+   };
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA 0x01 /* not available */
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index c66a485a24ac..c4af1159a200 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -891,6 +891,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_PID_CGROUP   (1UL << 2) /* pid=cgroup id, per-cpu 
mode only */
 #define PERF_FLAG_FD_CLOEXEC   (1UL << 3) /* O_CLOEXEC */
 
+#if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
__u64 val;
struct {
@@ -902,6 +903,21 @@ union perf_mem_data_src {
mem_rsvd:31;
};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+   __u64 val;
+   struct {
+   __u64   mem_rsvd:31,
+   mem_dtlb:7, /* tlb access */
+   mem_lock:2, /* lock instr */
+   mem_snoop:5,/* snoop mode */
+   mem_lvl:14, /* memory hierarchy level */
+   mem_op:5;   /* type of opcode */
+   };
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA 0x01 /* not available */
-- 
2.7.4



[PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-03-22 Thread Madhavan Srinivasan
From: Sukadev Bhattiprolu 

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms (Which is broken now), we also need a big-endian represenation
of perf_mem_data_src. i.e, in a big endian system, if user request
PERF_SAMPLE_DATA_SRC (perf report -d), will get the default value from
perf_sample_data_init(), which is PERF_MEM_NA. Value for PERF_MEM_NA
is constructed using shifts:

  /* TLB access */
  #define PERF_MEM_TLB_NA   0x01 /* not available */
  ...
  #define PERF_MEM_TLB_SHIFT26

  #define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)

  #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
PERF_MEM_S(LVL, NA)   |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA)  |\
PERF_MEM_S(TLB, NA))

Which works out as:

  ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))

Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
in CPU endian.

But then in the perf tool, the code uses the bitfields to inspect the
value, and currently the bitfields are defined using little endian
ordering.

So eg. in perf_mem__tlb_scnprintf() we see:
  data_src->val = 0x5080021
 op = 0x0
lvl = 0x0
  snoop = 0x0
   lock = 0x0
   dtlb = 0x0
   rsvd = 0x5080021

Patch does a minimal fix of adding big endian definition of the bitfields
to match the values that are already exported by the kernel on big endian.
And it makes no change on little endian.

Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Stephane Eranian 
Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Madhavan Srinivasan 
---
Changelog v2:
-Added Michael Ellerman's explanation to comiit message.
Changelog v1:
-Fixed author-ship and added suka's "Signed-off-by:".

 include/uapi/linux/perf_event.h   | 16 
 tools/include/uapi/linux/perf_event.h | 16 
 2 files changed, 32 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485a24ac..c4af1159a200 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -891,6 +891,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_PID_CGROUP   (1UL << 2) /* pid=cgroup id, per-cpu 
mode only */
 #define PERF_FLAG_FD_CLOEXEC   (1UL << 3) /* O_CLOEXEC */
 
+#if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
__u64 val;
struct {
@@ -902,6 +903,21 @@ union perf_mem_data_src {
mem_rsvd:31;
};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+   __u64 val;
+   struct {
+   __u64   mem_rsvd:31,
+   mem_dtlb:7, /* tlb access */
+   mem_lock:2, /* lock instr */
+   mem_snoop:5,/* snoop mode */
+   mem_lvl:14, /* memory hierarchy level */
+   mem_op:5;   /* type of opcode */
+   };
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA 0x01 /* not available */
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index c66a485a24ac..c4af1159a200 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -891,6 +891,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_PID_CGROUP   (1UL << 2) /* pid=cgroup id, per-cpu 
mode only */
 #define PERF_FLAG_FD_CLOEXEC   (1UL << 3) /* O_CLOEXEC */
 
+#if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
__u64 val;
struct {
@@ -902,6 +903,21 @@ union perf_mem_data_src {
mem_rsvd:31;
};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+   __u64 val;
+   struct {
+   __u64   mem_rsvd:31,
+   mem_dtlb:7, /* tlb access */
+   mem_lock:2, /* lock instr */
+   mem_snoop:5,/* snoop mode */
+   mem_lvl:14, /* memory hierarchy level */
+   mem_op:5;   /* type of opcode */
+   };
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA 0x01 /* not available */
-- 
2.7.4