Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)
Em Thu, Sep 22, 2016 at 01:23:04PM -0500, Paul Clarke escreveu: > On 09/22/2016 12:50 PM, Vineet Gupta wrote: > >On 09/22/2016 12:56 AM, Peter Zijlstra wrote: > >>On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote: > >>>On 09/20/2016 03:56 PM, Vineet Gupta wrote: > On 09/01/2016 01:33 AM, Peter Zijlstra wrote: > >>- is that what perf event grouping is ? > > > >Again, nope. Perf event groups are single counter (so no implicit > >addition) that are co-scheduled on the PMU. > > I'm not sure I understand - does this require specific PMU/arch support - > as in > multiple conditions feeding to same counter. > >>> > >>>My read is that is that what Peter meant was that each event in the > >>>perf event group is a single counter, so all the events in the group > >>>are counted simultaneously. (No multiplexing.) > >> > >>Right, sorry for the poor wording. > >> > Again when you say co-scheduled what do you mean - why would anyone use > the event > grouping - is it when they only have 1 counter and they want to count 2 > conditions/events at the same time - isn't this same as event > multiplexing ? > >>> > >>>I'd say it's the converse of multiplexing. Instead of mapping > >>>multiple events to a single counter, perf event groups map a set of > >>>events each to their own counter, and they are active simultaneously. > >>>I suppose it's possible for the _groups_ to be multiplexed with other > >>>events or groups, but the group as a whole will be scheduled together, > >>>as a group. > >> > >>Correct. > >> > >>Each events get their own hardware counter. Grouped events are > >>co-scheduled on the hardware. > > > >And if we don't group them, then they _may_ not be co-scheduled > >(active/counting > >at the same time) ? But how can this be possible. > >Say we have 2 counters, both the cmds below > > > > perf -e cycles,instructions hackbench > > perf -e {cycles,instructions} hackbench > > > >would assign 2 counters to the 2 conditions which keep counting until perf > >asks > >them to stop (because the profiled application ended) > > > >I don't understand the "scheduling" of counter - once we set them to count, > >there > >is no real intervention/scheduling form software in terms of > >disabling/enabling > >(assuming no multiplexing etc) So, getting this machine as an example: [0.067739] smpboot: CPU0: Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz (family: 0x6, model: 0x3a, stepping: 0x9) [0.067744] Performance Events: PEBS fmt1+, 16-deep LBR, IvyBridge events, full-width counters, Intel PMU driver. [0.067774] ... version:3 [0.067776] ... bit width: 48 [0.06] ... generic registers: 4 [0.067778] ... value mask: [0.067779] ... max period: [0.067780] ... fixed-purpose events: 3 [0.067781] ... event mask: 0007000f [0.068694] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. [root@zoo ~]# perf stat -e '{branch-instructions,branch-misses,bus-cycles,cache-misses}' ls a ls: cannot access 'a': No such file or directory Performance counter stats for 'ls a': 356,090 branch-instructions 17,170 branch-misses #4.82% of all branches 232,365 bus-cycles 12,107 cache-misses 0.003624967 seconds time elapsed [root@zoo ~]# perf stat -e '{branch-instructions,branch-misses,bus-cycles,cache-misses,cpu-cycles}' ls a ls: cannot access 'a': No such file or directory Performance counter stats for 'ls a': branch-instructions (0.00%) branch-misses (0.00%) bus-cycles (0.00%) cache-misses (0.00%) cpu-cycles (0.00%) 0.003659678 seconds time elapsed [root@zoo ~]# That was as a group, i.e. those {} enclosing it, if you run it with -vv, among other things you'll see the "group_fd" parameter to the sys_perf_event_open syscall: [root@zoo ~]# perf stat -vv -e '{branch-instructions,branch-misses,bus-cycles,cache-misses,cpu-cycles}' ls a sys_perf_event_open: pid 28581 cpu -1 group_fd -1 flags 0x8 sys_perf_event_open: pid 28581 cpu -1 group_fd 3 flags 0x8 sys_perf_event_open: pid 28581 cpu -1 group_fd 3 flags 0x8 sys_perf_event_open: pid 28581 cpu -1 group_fd 3 flags 0x8 sys_perf_event_open: pid 28581 cpu -1 group_fd 3 flags 0x8 ls: cannot access 'a': No such file or directory Performance counter stats for 'ls a':
Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)
On 09/22/2016 12:50 PM, Vineet Gupta wrote: On 09/22/2016 12:56 AM, Peter Zijlstra wrote: On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote: On 09/20/2016 03:56 PM, Vineet Gupta wrote: On 09/01/2016 01:33 AM, Peter Zijlstra wrote: - is that what perf event grouping is ? Again, nope. Perf event groups are single counter (so no implicit addition) that are co-scheduled on the PMU. I'm not sure I understand - does this require specific PMU/arch support - as in multiple conditions feeding to same counter. My read is that is that what Peter meant was that each event in the perf event group is a single counter, so all the events in the group are counted simultaneously. (No multiplexing.) Right, sorry for the poor wording. Again when you say co-scheduled what do you mean - why would anyone use the event grouping - is it when they only have 1 counter and they want to count 2 conditions/events at the same time - isn't this same as event multiplexing ? I'd say it's the converse of multiplexing. Instead of mapping multiple events to a single counter, perf event groups map a set of events each to their own counter, and they are active simultaneously. I suppose it's possible for the _groups_ to be multiplexed with other events or groups, but the group as a whole will be scheduled together, as a group. Correct. Each events get their own hardware counter. Grouped events are co-scheduled on the hardware. And if we don't group them, then they _may_ not be co-scheduled (active/counting at the same time) ? But how can this be possible. Say we have 2 counters, both the cmds below perf -e cycles,instructions hackbench perf -e {cycles,instructions} hackbench would assign 2 counters to the 2 conditions which keep counting until perf asks them to stop (because the profiled application ended) I don't understand the "scheduling" of counter - once we set them to count, there is no real intervention/scheduling form software in terms of disabling/enabling (assuming no multiplexing etc) If you assume no multiplexing, then this discussion on grouping is moot. It depends on how many events you specify, how many counters there are, and which counters can count which events. If you specify a set of events for which every event can be counted simultaneously, they will be scheduled simultaneously and continuously. If you specify more events than counters, there's multiplexing. AND, if you specify a set of events, some of which cannot be counted simultaneously due to hardware limitations, they'll be multiplexed. PC ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)
On 09/22/2016 12:56 AM, Peter Zijlstra wrote: > On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote: >> On 09/20/2016 03:56 PM, Vineet Gupta wrote: >>> On 09/01/2016 01:33 AM, Peter Zijlstra wrote: > - is that what perf event grouping is ? Again, nope. Perf event groups are single counter (so no implicit addition) that are co-scheduled on the PMU. >>> >>> I'm not sure I understand - does this require specific PMU/arch support - >>> as in >>> multiple conditions feeding to same counter. >> >> My read is that is that what Peter meant was that each event in the >> perf event group is a single counter, so all the events in the group >> are counted simultaneously. (No multiplexing.) > > Right, sorry for the poor wording. > >>> Again when you say co-scheduled what do you mean - why would anyone use the >>> event >>> grouping - is it when they only have 1 counter and they want to count 2 >>> conditions/events at the same time - isn't this same as event multiplexing ? >> >> I'd say it's the converse of multiplexing. Instead of mapping >> multiple events to a single counter, perf event groups map a set of >> events each to their own counter, and they are active simultaneously. >> I suppose it's possible for the _groups_ to be multiplexed with other >> events or groups, but the group as a whole will be scheduled together, >> as a group. > > Correct. > > Each events get their own hardware counter. Grouped events are > co-scheduled on the hardware. And if we don't group them, then they _may_ not be co-scheduled (active/counting at the same time) ? But how can this be possible. Say we have 2 counters, both the cmds below perf -e cycles,instructions hackbench perf -e {cycles,instructions} hackbench would assign 2 counters to the 2 conditions which keep counting until perf asks them to stop (because the profiled application ended) I don't understand the "scheduling" of counter - once we set them to count, there is no real intervention/scheduling form software in terms of disabling/enabling (assuming no multiplexing etc) > You can multiplex groups. But if one event in a group is schedule, they > all must be. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)
On 09/01/2016 01:33 AM, Peter Zijlstra wrote: >> - is that what perf event grouping is ? > > Again, nope. Perf event groups are single counter (so no implicit > addition) that are co-scheduled on the PMU. I'm not sure I understand - does this require specific PMU/arch support - as in multiple conditions feeding to same counter. How does perf user make use of this info - I tried googling around but can't seem to find anything which explains the semantics. I can see that group events to work on ARC (although in our case a counter can cont one condition at a time only) and the results seem to be similar whther we group or not. ->8 [ARCLinux]# perf stat -e {cycles,instructions} hackbench Running with 10*40 (== 400) tasks. Time: 37.430 Performance counter stats for 'hackbench': 348173cycles 1351709784instructions#0.39 insn per cycle 38.957481536 seconds time elapsed [ARCLinux]# perf stat -e cycles hackbench Running with 10*40 (== 400) tasks. Time: 36.735 Performance counter stats for 'hackbench': 3426151391cycles 38.247235981 seconds time elapsed [ARCLinux]# [ARCLinux]# perf stat -e instructions hackbench Running with 10*40 (== 400) tasks. Time: 37.537 Performance counter stats for 'hackbench': 1355421559instructions 39.061784281 seconds time elapsed ->8 ... > > You can do it like: > > perf stat -e '{cycles,instructions}' > > Which will place the cycles event and the instructions event in a group > and thereby guarantee they're co-scheduled. Again when you say co-scheduled what do you mean - why would anyone use the event grouping - is it when they only have 1 counter and they want to count 2 conditions/events at the same time - isn't this same as event multiplexing ? -Vineet ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc