Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events

2021-03-15 Thread Jin, Yao

Hi Jiri,

On 3/16/2021 7:05 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:

For hardware events, they have pre-defined configs. The kernel
needs to know where the event comes from (e.g. from cpu_core pmu
or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE'
can't carry pmu information.

So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'.
The new attr.config layout for PERF_TYPE_HARDWARE_PMU is:

0xDD00AA
AA: original hardware event ID
DD: PMU type ID

PMU type ID is retrieved from sysfs. For example,

   cat /sys/devices/cpu_atom/type
   10

   cat /sys/devices/cpu_core/type
   4

When enabling a hybrid hardware event without specified pmu, such as,
'perf stat -e cycles -a', two events are created automatically. One
is for atom, the other is for core.


ok I think I understand the need for this (and the following) patch
the perf_hw_id counters could be global, so when you specify only
event like:

-e cycles

you want all the cycles, which on hybrid system means cycles from
more than one pmus



Yes, on hybrid system it means the cycles from two pmus. One cycle is from cpu_core pmu, another 
cycles is from cpu_atom pmu.



SNIP


@@ -1416,6 +1475,8 @@ int parse_events_add_numeric(struct parse_events_state 
*parse_state,
  {
struct perf_event_attr attr;
LIST_HEAD(config_terms);
+   bool hybrid;
+   int ret;
  
  	memset(, 0, sizeof(attr));

attr.type = type;
@@ -1430,6 +1491,18 @@ int parse_events_add_numeric(struct parse_events_state 
*parse_state,
return -ENOMEM;
}
  
+	/*

+* Skip the software dummy event.
+*/
+   if (type != PERF_TYPE_SOFTWARE) {
+   if (!perf_pmu__hybrid_exist())
+   perf_pmu__scan(NULL);


this could be checked in the following add_hybrid_numeric call



Yes, that should be OK. I will move the check in the next version.


+
+   ret = add_hybrid_numeric(parse_state, list, , );
+   if (hybrid)
+   return ret;
+   }


could we add this to separate object.. hybrid.c or maybe parse-events-hybrid.c,

there's already global __add_event wrapper - parse_events__add_event


jirka



Use a new parse-events-hybrid.c, hmm, well that's OK.

Thanks
Jin Yao


+
return add_event(list, _state->idx, ,
 get_config_name(head_config), _terms);
  }
--
2.17.1





Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events

2021-03-15 Thread Jiri Olsa
On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:
> For hardware events, they have pre-defined configs. The kernel
> needs to know where the event comes from (e.g. from cpu_core pmu
> or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE'
> can't carry pmu information.
> 
> So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'.
> The new attr.config layout for PERF_TYPE_HARDWARE_PMU is:
> 
> 0xDD00AA
> AA: original hardware event ID
> DD: PMU type ID
> 
> PMU type ID is retrieved from sysfs. For example,
> 
>   cat /sys/devices/cpu_atom/type
>   10
> 
>   cat /sys/devices/cpu_core/type
>   4
> 
> When enabling a hybrid hardware event without specified pmu, such as,
> 'perf stat -e cycles -a', two events are created automatically. One
> is for atom, the other is for core.

ok I think I understand the need for this (and the following) patch
the perf_hw_id counters could be global, so when you specify only
event like:

   -e cycles

you want all the cycles, which on hybrid system means cycles from
more than one pmus

SNIP

> @@ -1416,6 +1475,8 @@ int parse_events_add_numeric(struct parse_events_state 
> *parse_state,
>  {
>   struct perf_event_attr attr;
>   LIST_HEAD(config_terms);
> + bool hybrid;
> + int ret;
>  
>   memset(, 0, sizeof(attr));
>   attr.type = type;
> @@ -1430,6 +1491,18 @@ int parse_events_add_numeric(struct parse_events_state 
> *parse_state,
>   return -ENOMEM;
>   }
>  
> + /*
> +  * Skip the software dummy event.
> +  */
> + if (type != PERF_TYPE_SOFTWARE) {
> + if (!perf_pmu__hybrid_exist())
> + perf_pmu__scan(NULL);

this could be checked in the following add_hybrid_numeric call

> +
> + ret = add_hybrid_numeric(parse_state, list, , );
> + if (hybrid)
> + return ret;
> + }

could we add this to separate object.. hybrid.c or maybe parse-events-hybrid.c,

there's already global __add_event wrapper - parse_events__add_event 


jirka

> +
>   return add_event(list, _state->idx, ,
>get_config_name(head_config), _terms);
>  }
> -- 
> 2.17.1
> 



Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events

2021-03-15 Thread Jiri Olsa
On Mon, Mar 15, 2021 at 10:04:56AM +0800, Jin, Yao wrote:
> Hi Jiri,
> 
> On 3/13/2021 3:15 AM, Jiri Olsa wrote:
> > On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:
> > 
> > SNIP
> > 
> > >cycles: 4: 800933425 1002536659 1002536659
> > >cycles: 5: 800928573 1002528386 1002528386
> > >cycles: 6: 800924347 1002520527 1002520527
> > >cycles: 7: 800922009 1002513176 1002513176
> > >cycles: 8: 800919624 1002507326 1002507326
> > >cycles: 9: 800917204 1002500663 1002500663
> > >cycles: 10: 802096579 1002494280 1002494280
> > >cycles: 11: 802093770 1002486404 1002486404
> > >cycles: 12: 803284338 1002479491 1002479491
> > >cycles: 13: 803277609 1002469777 1002469777
> > >cycles: 14: 800875902 1002458861 1002458861
> > >cycles: 15: 800873241 1002451350 1002451350
> > >cycles: 0: 800837379 1002444645 1002444645
> > >cycles: 1: 800833400 1002438505 1002438505
> > >cycles: 2: 800829291 1002433698 1002433698
> > >cycles: 3: 800824390 1002427584 1002427584
> > >cycles: 4: 800819360 1002422099 1002422099
> > >cycles: 5: 800814787 1002415845 1002415845
> > >cycles: 6: 800810125 1002410301 1002410301
> > >cycles: 7: 800791893 1002386845 1002386845
> > >cycles: 12855737722 16040169029 16040169029
> > >cycles: 6406560625 8019379522 8019379522
> > > 
> > > Performance counter stats for 'system wide':
> > > 
> > >12,855,737,722  cpu_core/cycles/
> > > 6,406,560,625  cpu_atom/cycles/
> > 
> > so we do that no_merge stuff for uncore pmus, why can't we do
> > that in here? that'd seems like generic way
> > 
> > jirka
> > 
> 
> We have set the "stat_config.no_merge = true;" in "[PATCH v2 08/27] perf
> stat: Uniquify hybrid event name".
> 
> For hybrid hardware events, they have different configs. The config is
> 0xDD00AA (0x4 for core vs. 0xa for atom in this example)
> 
> We use perf_pmu__for_each_hybrid_pmu() to iterate all hybrid PMUs, generate
> the configs and create the evsels for each hybrid PMU. This logic and the
> code are not complex and easy to understand.
> 
> Uncore looks complicated. It has uncore alias concept which is for different
> PMUs but with same prefix. Such as "uncore_cbox" for "uncore_cbox_0" to
> "uncore_cbox_9". But the uncore alias concept doesn't apply to hybrid pmu
> (we just have "cpu_core" and "cpu_atom" here). And actually I also don't
> want to mix the core stuff with uncore stuff, that would be hard for
> understanding.
> 
> Perhaps I misunderstand, correct me if I'm wrong.

not sure, I thought the merging stuff was more generic,
because the change looks too specific for me, I'll try
to check on it more deeply

jirka



Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events

2021-03-14 Thread Jin, Yao

Hi Jiri,

On 3/13/2021 3:15 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:

SNIP


   cycles: 4: 800933425 1002536659 1002536659
   cycles: 5: 800928573 1002528386 1002528386
   cycles: 6: 800924347 1002520527 1002520527
   cycles: 7: 800922009 1002513176 1002513176
   cycles: 8: 800919624 1002507326 1002507326
   cycles: 9: 800917204 1002500663 1002500663
   cycles: 10: 802096579 1002494280 1002494280
   cycles: 11: 802093770 1002486404 1002486404
   cycles: 12: 803284338 1002479491 1002479491
   cycles: 13: 803277609 1002469777 1002469777
   cycles: 14: 800875902 1002458861 1002458861
   cycles: 15: 800873241 1002451350 1002451350
   cycles: 0: 800837379 1002444645 1002444645
   cycles: 1: 800833400 1002438505 1002438505
   cycles: 2: 800829291 1002433698 1002433698
   cycles: 3: 800824390 1002427584 1002427584
   cycles: 4: 800819360 1002422099 1002422099
   cycles: 5: 800814787 1002415845 1002415845
   cycles: 6: 800810125 1002410301 1002410301
   cycles: 7: 800791893 1002386845 1002386845
   cycles: 12855737722 16040169029 16040169029
   cycles: 6406560625 8019379522 8019379522

Performance counter stats for 'system wide':

   12,855,737,722  cpu_core/cycles/
6,406,560,625  cpu_atom/cycles/


so we do that no_merge stuff for uncore pmus, why can't we do
that in here? that'd seems like generic way

jirka



We have set the "stat_config.no_merge = true;" in "[PATCH v2 08/27] perf stat: Uniquify hybrid event 
name".


For hybrid hardware events, they have different configs. The config is 0xDD00AA (0x4 for 
core vs. 0xa for atom in this example)


We use perf_pmu__for_each_hybrid_pmu() to iterate all hybrid PMUs, generate the configs and create 
the evsels for each hybrid PMU. This logic and the code are not complex and easy to understand.


Uncore looks complicated. It has uncore alias concept which is for different PMUs but with same 
prefix. Such as "uncore_cbox" for "uncore_cbox_0" to "uncore_cbox_9". But the uncore alias concept 
doesn't apply to hybrid pmu (we just have "cpu_core" and "cpu_atom" here). And actually I also don't 
want to mix the core stuff with uncore stuff, that would be hard for understanding.


Perhaps I misunderstand, correct me if I'm wrong.

Thanks
Jin Yao



Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events

2021-03-12 Thread Jiri Olsa
On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:

SNIP

>   cycles: 4: 800933425 1002536659 1002536659
>   cycles: 5: 800928573 1002528386 1002528386
>   cycles: 6: 800924347 1002520527 1002520527
>   cycles: 7: 800922009 1002513176 1002513176
>   cycles: 8: 800919624 1002507326 1002507326
>   cycles: 9: 800917204 1002500663 1002500663
>   cycles: 10: 802096579 1002494280 1002494280
>   cycles: 11: 802093770 1002486404 1002486404
>   cycles: 12: 803284338 1002479491 1002479491
>   cycles: 13: 803277609 1002469777 1002469777
>   cycles: 14: 800875902 1002458861 1002458861
>   cycles: 15: 800873241 1002451350 1002451350
>   cycles: 0: 800837379 1002444645 1002444645
>   cycles: 1: 800833400 1002438505 1002438505
>   cycles: 2: 800829291 1002433698 1002433698
>   cycles: 3: 800824390 1002427584 1002427584
>   cycles: 4: 800819360 1002422099 1002422099
>   cycles: 5: 800814787 1002415845 1002415845
>   cycles: 6: 800810125 1002410301 1002410301
>   cycles: 7: 800791893 1002386845 1002386845
>   cycles: 12855737722 16040169029 16040169029
>   cycles: 6406560625 8019379522 8019379522
> 
>Performance counter stats for 'system wide':
> 
>   12,855,737,722  cpu_core/cycles/
>6,406,560,625  cpu_atom/cycles/

so we do that no_merge stuff for uncore pmus, why can't we do
that in here? that'd seems like generic way

jirka



[PATCH v2 09/27] perf parse-events: Create two hybrid hardware events

2021-03-10 Thread Jin Yao
For hardware events, they have pre-defined configs. The kernel
needs to know where the event comes from (e.g. from cpu_core pmu
or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE'
can't carry pmu information.

So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'.
The new attr.config layout for PERF_TYPE_HARDWARE_PMU is:

0xDD00AA
AA: original hardware event ID
DD: PMU type ID

PMU type ID is retrieved from sysfs. For example,

  cat /sys/devices/cpu_atom/type
  10

  cat /sys/devices/cpu_core/type
  4

When enabling a hybrid hardware event without specified pmu, such as,
'perf stat -e cycles -a', two events are created automatically. One
is for atom, the other is for core.

  root@ssp-pwrt-002:~# ./perf stat -e cycles -vv -a -- sleep 1
  Control descriptor is not initialized
  
  perf_event_attr:
type 6
size 120
config   0x4
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  
  perf_event_attr:
type 6
size 120
config   0xa
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 21
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  cycles: 0: 810754998 1002563650 1002563650
  cycles: 1: 810749852 1002559947 1002559947
  cycles: 2: 808096005 1002555036 1002555036
  cycles: 3: 808090246 1002543496 1002543496
  cycles: 4: 800933425 1002536659 1002536659
  cycles: 5: 800928573 1002528386 1002528386
  cycles: 6: 800924347 1002520527 1002520527
  cycles: 7: 800922009 1002513176 1002513176
  cycles: 8: 800919624 1002507326 1002507326
  cycles: 9: 800917204 1002500663 1002500663
  cycles: 10: 802096579 1002494280 1002494280
  cycles: 11: 802093770 1002486404 1002486404
  cycles: 12: 803284338 1002479491 1002479491
  cycles: 13: 803277609 1002469777 1002469777
  cycles: 14: 800875902 1002458861 1002458861
  cycles: 15: 800873241 1002451350 1002451350
  cycles: 0: 800837379 1002444645 1002444645
  cycles: 1: 800833400 1002438505 1002438505
  cycles: 2: 800829291 1002433698 1002433698
  cycles: 3: 800824390 1002427584 1002427584
  cycles: 4: 800819360 1002422099 1002422099
  cycles: 5: 800814787 1002415845 1002415845
  cycles: 6: 800810125 1002410301 1002410301
  cycles: 7: 800791893 1002386845 1002386845
  cycles: 12855737722 16040169029 16040169029
  cycles: 6406560625 8019379522 8019379522

   Performance counter stats for 'system wide':

  12,855,737,722  cpu_core/cycles/
   6,406,560,625  cpu_atom/cycles/

 1.002774658 seconds time elapsed

type 6 is PERF_TYPE_HARDWARE_PMU.
0x4 in 0x4 indicates the cpu_core pmu.
0xa in 0xa indicates the cpu_atom pmu.

Signed-off-by: Jin Yao 
---
 tools/perf/util/parse-events.c | 73