Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
Hi Jiri, On 3/16/2021 7:05 AM, Jiri Olsa wrote: On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote: For hardware events, they have pre-defined configs. The kernel needs to know where the event comes from (e.g. from cpu_core pmu or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE' can't carry pmu information. So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'. The new attr.config layout for PERF_TYPE_HARDWARE_PMU is: 0xDD00AA AA: original hardware event ID DD: PMU type ID PMU type ID is retrieved from sysfs. For example, cat /sys/devices/cpu_atom/type 10 cat /sys/devices/cpu_core/type 4 When enabling a hybrid hardware event without specified pmu, such as, 'perf stat -e cycles -a', two events are created automatically. One is for atom, the other is for core. ok I think I understand the need for this (and the following) patch the perf_hw_id counters could be global, so when you specify only event like: -e cycles you want all the cycles, which on hybrid system means cycles from more than one pmus Yes, on hybrid system it means the cycles from two pmus. One cycle is from cpu_core pmu, another cycles is from cpu_atom pmu. SNIP @@ -1416,6 +1475,8 @@ int parse_events_add_numeric(struct parse_events_state *parse_state, { struct perf_event_attr attr; LIST_HEAD(config_terms); + bool hybrid; + int ret; memset(&attr, 0, sizeof(attr)); attr.type = type; @@ -1430,6 +1491,18 @@ int parse_events_add_numeric(struct parse_events_state *parse_state, return -ENOMEM; } + /* +* Skip the software dummy event. +*/ + if (type != PERF_TYPE_SOFTWARE) { + if (!perf_pmu__hybrid_exist()) + perf_pmu__scan(NULL); this could be checked in the following add_hybrid_numeric call Yes, that should be OK. I will move the check in the next version. + + ret = add_hybrid_numeric(parse_state, list, &attr, &hybrid); + if (hybrid) + return ret; + } could we add this to separate object.. hybrid.c or maybe parse-events-hybrid.c, there's already global __add_event wrapper - parse_events__add_event jirka Use a new parse-events-hybrid.c, hmm, well that's OK. Thanks Jin Yao + return add_event(list, &parse_state->idx, &attr, get_config_name(head_config), &config_terms); } -- 2.17.1
Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote: > For hardware events, they have pre-defined configs. The kernel > needs to know where the event comes from (e.g. from cpu_core pmu > or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE' > can't carry pmu information. > > So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'. > The new attr.config layout for PERF_TYPE_HARDWARE_PMU is: > > 0xDD00AA > AA: original hardware event ID > DD: PMU type ID > > PMU type ID is retrieved from sysfs. For example, > > cat /sys/devices/cpu_atom/type > 10 > > cat /sys/devices/cpu_core/type > 4 > > When enabling a hybrid hardware event without specified pmu, such as, > 'perf stat -e cycles -a', two events are created automatically. One > is for atom, the other is for core. ok I think I understand the need for this (and the following) patch the perf_hw_id counters could be global, so when you specify only event like: -e cycles you want all the cycles, which on hybrid system means cycles from more than one pmus SNIP > @@ -1416,6 +1475,8 @@ int parse_events_add_numeric(struct parse_events_state > *parse_state, > { > struct perf_event_attr attr; > LIST_HEAD(config_terms); > + bool hybrid; > + int ret; > > memset(&attr, 0, sizeof(attr)); > attr.type = type; > @@ -1430,6 +1491,18 @@ int parse_events_add_numeric(struct parse_events_state > *parse_state, > return -ENOMEM; > } > > + /* > + * Skip the software dummy event. > + */ > + if (type != PERF_TYPE_SOFTWARE) { > + if (!perf_pmu__hybrid_exist()) > + perf_pmu__scan(NULL); this could be checked in the following add_hybrid_numeric call > + > + ret = add_hybrid_numeric(parse_state, list, &attr, &hybrid); > + if (hybrid) > + return ret; > + } could we add this to separate object.. hybrid.c or maybe parse-events-hybrid.c, there's already global __add_event wrapper - parse_events__add_event jirka > + > return add_event(list, &parse_state->idx, &attr, >get_config_name(head_config), &config_terms); > } > -- > 2.17.1 >
Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
On Mon, Mar 15, 2021 at 10:04:56AM +0800, Jin, Yao wrote: > Hi Jiri, > > On 3/13/2021 3:15 AM, Jiri Olsa wrote: > > On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote: > > > > SNIP > > > > >cycles: 4: 800933425 1002536659 1002536659 > > >cycles: 5: 800928573 1002528386 1002528386 > > >cycles: 6: 800924347 1002520527 1002520527 > > >cycles: 7: 800922009 1002513176 1002513176 > > >cycles: 8: 800919624 1002507326 1002507326 > > >cycles: 9: 800917204 1002500663 1002500663 > > >cycles: 10: 802096579 1002494280 1002494280 > > >cycles: 11: 802093770 1002486404 1002486404 > > >cycles: 12: 803284338 1002479491 1002479491 > > >cycles: 13: 803277609 1002469777 1002469777 > > >cycles: 14: 800875902 1002458861 1002458861 > > >cycles: 15: 800873241 1002451350 1002451350 > > >cycles: 0: 800837379 1002444645 1002444645 > > >cycles: 1: 800833400 1002438505 1002438505 > > >cycles: 2: 800829291 1002433698 1002433698 > > >cycles: 3: 800824390 1002427584 1002427584 > > >cycles: 4: 800819360 1002422099 1002422099 > > >cycles: 5: 800814787 1002415845 1002415845 > > >cycles: 6: 800810125 1002410301 1002410301 > > >cycles: 7: 800791893 1002386845 1002386845 > > >cycles: 12855737722 16040169029 16040169029 > > >cycles: 6406560625 8019379522 8019379522 > > > > > > Performance counter stats for 'system wide': > > > > > >12,855,737,722 cpu_core/cycles/ > > > 6,406,560,625 cpu_atom/cycles/ > > > > so we do that no_merge stuff for uncore pmus, why can't we do > > that in here? that'd seems like generic way > > > > jirka > > > > We have set the "stat_config.no_merge = true;" in "[PATCH v2 08/27] perf > stat: Uniquify hybrid event name". > > For hybrid hardware events, they have different configs. The config is > 0xDD00AA (0x4 for core vs. 0xa for atom in this example) > > We use perf_pmu__for_each_hybrid_pmu() to iterate all hybrid PMUs, generate > the configs and create the evsels for each hybrid PMU. This logic and the > code are not complex and easy to understand. > > Uncore looks complicated. It has uncore alias concept which is for different > PMUs but with same prefix. Such as "uncore_cbox" for "uncore_cbox_0" to > "uncore_cbox_9". But the uncore alias concept doesn't apply to hybrid pmu > (we just have "cpu_core" and "cpu_atom" here). And actually I also don't > want to mix the core stuff with uncore stuff, that would be hard for > understanding. > > Perhaps I misunderstand, correct me if I'm wrong. not sure, I thought the merging stuff was more generic, because the change looks too specific for me, I'll try to check on it more deeply jirka
Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
Hi Jiri, On 3/13/2021 3:15 AM, Jiri Olsa wrote: On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote: SNIP cycles: 4: 800933425 1002536659 1002536659 cycles: 5: 800928573 1002528386 1002528386 cycles: 6: 800924347 1002520527 1002520527 cycles: 7: 800922009 1002513176 1002513176 cycles: 8: 800919624 1002507326 1002507326 cycles: 9: 800917204 1002500663 1002500663 cycles: 10: 802096579 1002494280 1002494280 cycles: 11: 802093770 1002486404 1002486404 cycles: 12: 803284338 1002479491 1002479491 cycles: 13: 803277609 1002469777 1002469777 cycles: 14: 800875902 1002458861 1002458861 cycles: 15: 800873241 1002451350 1002451350 cycles: 0: 800837379 1002444645 1002444645 cycles: 1: 800833400 1002438505 1002438505 cycles: 2: 800829291 1002433698 1002433698 cycles: 3: 800824390 1002427584 1002427584 cycles: 4: 800819360 1002422099 1002422099 cycles: 5: 800814787 1002415845 1002415845 cycles: 6: 800810125 1002410301 1002410301 cycles: 7: 800791893 1002386845 1002386845 cycles: 12855737722 16040169029 16040169029 cycles: 6406560625 8019379522 8019379522 Performance counter stats for 'system wide': 12,855,737,722 cpu_core/cycles/ 6,406,560,625 cpu_atom/cycles/ so we do that no_merge stuff for uncore pmus, why can't we do that in here? that'd seems like generic way jirka We have set the "stat_config.no_merge = true;" in "[PATCH v2 08/27] perf stat: Uniquify hybrid event name". For hybrid hardware events, they have different configs. The config is 0xDD00AA (0x4 for core vs. 0xa for atom in this example) We use perf_pmu__for_each_hybrid_pmu() to iterate all hybrid PMUs, generate the configs and create the evsels for each hybrid PMU. This logic and the code are not complex and easy to understand. Uncore looks complicated. It has uncore alias concept which is for different PMUs but with same prefix. Such as "uncore_cbox" for "uncore_cbox_0" to "uncore_cbox_9". But the uncore alias concept doesn't apply to hybrid pmu (we just have "cpu_core" and "cpu_atom" here). And actually I also don't want to mix the core stuff with uncore stuff, that would be hard for understanding. Perhaps I misunderstand, correct me if I'm wrong. Thanks Jin Yao
Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote: SNIP > cycles: 4: 800933425 1002536659 1002536659 > cycles: 5: 800928573 1002528386 1002528386 > cycles: 6: 800924347 1002520527 1002520527 > cycles: 7: 800922009 1002513176 1002513176 > cycles: 8: 800919624 1002507326 1002507326 > cycles: 9: 800917204 1002500663 1002500663 > cycles: 10: 802096579 1002494280 1002494280 > cycles: 11: 802093770 1002486404 1002486404 > cycles: 12: 803284338 1002479491 1002479491 > cycles: 13: 803277609 1002469777 1002469777 > cycles: 14: 800875902 1002458861 1002458861 > cycles: 15: 800873241 1002451350 1002451350 > cycles: 0: 800837379 1002444645 1002444645 > cycles: 1: 800833400 1002438505 1002438505 > cycles: 2: 800829291 1002433698 1002433698 > cycles: 3: 800824390 1002427584 1002427584 > cycles: 4: 800819360 1002422099 1002422099 > cycles: 5: 800814787 1002415845 1002415845 > cycles: 6: 800810125 1002410301 1002410301 > cycles: 7: 800791893 1002386845 1002386845 > cycles: 12855737722 16040169029 16040169029 > cycles: 6406560625 8019379522 8019379522 > >Performance counter stats for 'system wide': > > 12,855,737,722 cpu_core/cycles/ >6,406,560,625 cpu_atom/cycles/ so we do that no_merge stuff for uncore pmus, why can't we do that in here? that'd seems like generic way jirka
[PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
For hardware events, they have pre-defined configs. The kernel needs to know where the event comes from (e.g. from cpu_core pmu or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE' can't carry pmu information. So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'. The new attr.config layout for PERF_TYPE_HARDWARE_PMU is: 0xDD00AA AA: original hardware event ID DD: PMU type ID PMU type ID is retrieved from sysfs. For example, cat /sys/devices/cpu_atom/type 10 cat /sys/devices/cpu_core/type 4 When enabling a hybrid hardware event without specified pmu, such as, 'perf stat -e cycles -a', two events are created automatically. One is for atom, the other is for core. root@ssp-pwrt-002:~# ./perf stat -e cycles -vv -a -- sleep 1 Control descriptor is not initialized perf_event_attr: type 6 size 120 config 0x4 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest1 sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 4 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 8 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid -1 cpu 8 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid -1 cpu 9 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid -1 cpu 10 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 11 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 12 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 13 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 14 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 19 perf_event_attr: type 6 size 120 config 0xa sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest1 sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 17 group_fd -1 flags 0x8 = 21 sys_perf_event_open: pid -1 cpu 18 group_fd -1 flags 0x8 = 22 sys_perf_event_open: pid -1 cpu 19 group_fd -1 flags 0x8 = 23 sys_perf_event_open: pid -1 cpu 20 group_fd -1 flags 0x8 = 24 sys_perf_event_open: pid -1 cpu 21 group_fd -1 flags 0x8 = 25 sys_perf_event_open: pid -1 cpu 22 group_fd -1 flags 0x8 = 26 sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 27 cycles: 0: 810754998 1002563650 1002563650 cycles: 1: 810749852 1002559947 1002559947 cycles: 2: 808096005 1002555036 1002555036 cycles: 3: 808090246 1002543496 1002543496 cycles: 4: 800933425 1002536659 1002536659 cycles: 5: 800928573 1002528386 1002528386 cycles: 6: 800924347 1002520527 1002520527 cycles: 7: 800922009 1002513176 1002513176 cycles: 8: 800919624 1002507326 1002507326 cycles: 9: 800917204 1002500663 1002500663 cycles: 10: 802096579 1002494280 1002494280 cycles: 11: 802093770 1002486404 1002486404 cycles: 12: 803284338 1002479491 1002479491 cycles: 13: 803277609 1002469777 1002469777 cycles: 14: 800875902 1002458861 1002458861 cycles: 15: 800873241 1002451350 1002451350 cycles: 0: 800837379 1002444645 1002444645 cycles: 1: 800833400 1002438505 1002438505 cycles: 2: 800829291 1002433698 1002433698 cycles: 3: 800824390 1002427584 1002427584 cycles: 4: 800819360 1002422099 1002422099 cycles: 5: 800814787 1002415845 1002415845 cycles: 6: 800810125 1002410301 1002410301 cycles: 7: 800791893 1002386845 1002386845 cycles: 12855737722 16040169029 16040169029 cycles: 6406560625 8019379522 8019379522 Performance counter stats for 'system wide': 12,855,737,722 cpu_core/cycles/ 6,406,560,625 cpu_atom/cycles/ 1.002774658 seconds time elapsed type 6 is PERF_TYPE_HARDWARE_PMU. 0x4 in 0x4 indicates the cpu_core pmu. 0xa in 0xa indicates the cpu_atom pmu. Signed-off-by: Jin Yao --- tools/perf/util/parse-events.c | 73 ++