Ingo Molnar wrote:
>> 2/ Grouping
>>
>> By design, an event can only be part of one group at a time.
>> Events in a group are guaranteed to be active on the PMU at the
>> same time. That means a group cannot have more events than there
>> are available counters on the PMU. Tools may want to know the
>> number of counters available in order to group their events
>> accordingly, such that reliable ratios could be computed. It seems
>> the only way to know this is by trial and error. This is not
>> practical.
> 
> Groups are there to support heavily constrained PMUs, and for them
> this is the only way, as there is no simple linear expression for
> how many counters one can load on the PMU.
> 
> The ideal model to tooling is relatively independent PMU registers
> (counters) with little constraints - most modern CPUs meet that
> model.
> 
> All the existing tooling (tools/perf/) operates on that model and
> this leads to easy programmability and flexible results. This model
> needs no grouping of counters.
> 
> Could you please cite specific examples in terms of tools/perf/?
> What feature do you think needs to know more about constraints? What
> is the specific win in precision we could achieve via that?


An example of this is that a user wants to monitor 10 events, and we have four 
counters to work with.  Let's assume there is some mapping of events to 
counters 
where you need only 3 groups to schedule the 10 events onto the PMU.  If you 
leave it to the kernel (and don't group the events from user space), depending 
on the kernel's fast event scheduling algorithm, it may take 6 groups to get 
all 
of the requested events counted.  This leads to lower counts in the counters, 
and more chance for the counters to miss event bursts, which leads to less 
accurate scaled results.

Currently the PAPI substrate for PCL does do this partitioning using a very 
dumb 
algorithm.  But it could be improved, particularly if there was some better way 
to get feedback from the kernel other than a "yes, these fit" or "no, these 
don't fit".  I'm not sure what that way would be, though.  Perhaps an ioctl 
that 
does a some sort of "dry scheduling" of events to groups in an optimal way. 
This call would not need to lock any resources, and just use the kernel's 
algorithm for event constraint checking.

To me, this is not a big issue, but some sort of better mechanism might be 
considered for a future update.

-- 
Regards,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
503-578-3507
cjash...@us.ibm.com


------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to