Corey,

On Wed, Jun 10, 2009 at 12:42 AM, Corey J Ashford <cjash...@us.ibm.com>wrote:

> This looks great, Stephane.
>
> stephane eranian <eran...@googlemail.com> wrote on 06/09/2009 02:54:16 PM:
> >
> >
> >    - the listing and event info API is still returning integer, but
> > they are opaque identifiers. It is
> >      not possible to do for(i=0;  i< num_events;i++). Instead, you
> > can use a helper macros with
> >      accessor functions: for_each_event(). The same kind of opaque
> > identifiers is used to list and
> >      query  attributes.
>
> This is a great idea.  It makes it so the OS-specific layer doesn't have
> to create a unified/monotonic event numbering... it just has to be able to
> enumerate all of the events and give them unique ids which it is free to
> concoct.
>
> You might also want to provide these functions:
>
>    int pfm_first_event(void);  /* returns first opaque event identifier
> and implicity starts the sequence over again */
>    int pfm_next_event(int opaque_event_id); /* returns -1 when you've
> gone past the end of the event list */
>

They are there:
#define for_each_event(x) \
        for((x)=pfm_get_first_event(); (x) != -1; (x) =
pfm_get_next_event((x)))

#define for_each_event_attr(x, z) \
        for((x)=pfm_get_first_attr((z)); (x) != -1; (x) =
pfm_get_next_attr((z), (x)))

Yes, with this approach you do not have to ensure a unified numbering
scheme.
It also makes it easy to handle event tables with holes, e.g., AMD64.
The conversion from identifier to table + index is simple because the
identifier
is a bitfield with: 8 bits for the event table number (e.g,, OS vs. PMU),
and 16
bits for actual index in the table. If the split at 16 is not enough, we can
move
it without impacting applications.


>
> >    - attributes are always defined per event. Parsing and
> > decomposition happens internally in the generic
> >      libpfm layer. Not all events have the same set of attributes.
> > It is up to the developer of each PMU
> >      support to try and unify attributes with the same meaning, e.g.
> > "u" on AMD64 mean priv user 1,2,3
> >      same thing on Intel.
> >
> >    - to make things simpler, I have modified all calls which return
> > a string, e.g, description of an event or
> >      attribute, to allocate the string internally. Thus, no need to
> > worry about the buffer size but you need
> >      to free the string whenever you are done.
>
> I have mixed feelings about this because of the possibility of long-term
> memory fragmentation.  However, I think the vast majority of apps will not
> be calling the interface very often, so the problem should be fairly
> minimal.
>

Yes, that could indeed be a problem but you are right that it is not
anticipated that
libpfm would be used in critical paths, but rather during the setup phase of
a
measurement. The alternatives are:

   1- the function return value is the string, e.g., const char
*pfm_get_event_desc(int idx)

   2- pass a pre-allocated buffer and its size, like we have it today. But
then if the buffer
       is too small? The function could return PFMLIB_ERR_TOOSMALL. The
caller would have
       to iterate until success. Or we could provide a call to return the
maximum string length
       possible. That requires a prior scan for everything that could return
a string which is:

       extern pfm_err_t pfm_get_pmu_name(char **name);

       extern pfm_err_t pfm_get_event_name(int idx, char **name);
       extern pfm_err_t pfm_get_event_attr_name(int idx, int attr_idx, char
**name);

       extern pfm_err_t pfm_get_event_desc(int idx, char **str);
       extern pfm_err_t pfm_get_event_attr_desc(int idx, int attr_idx, char
**desc);


       extern pfm_err_t pfm_get_cycle_event(char **event);
       extern pfm_err_t pfm_get_inst_retired_event(char **event);

       The good thing about all those functions is that the string length is
known in advance, i.e.,
       the input parameters will not change the maximum size possible. The
max size can be known
       in advance for a PMU name, event name, event/attr descriptions.

       Going back to fragmentation, it is not clear to me that pushing the
allocation in the application
       (rather than the library) will make this problem go away. You will
still have to  allocate a buffer
       based on maximum size, and once you're done you'll have to free it.

>
> >
> >    - pfm_get_event_encoding() to return raw event encoding (not PCL)
>
> So no change here, except the rename from pfm_get_event_code to
> pfm_get_event_encoding?
>

That is a good point. The 2 functions are different. pfm_get_event_code()
does
not take a event description string as input but rather an opaque
identifier. Its
purpose is mostly for event enumeration. Here is an example with pfmon:

$ pfmon -iretired_instructions
Name     : RETIRED_INSTRUCTIONS
Code     : 0xc0
Counters : [ 0 1 2 3 ]
Desc     : Retired Instructions

The function returns 0xc0. You could also use the pfm_get_event_encoding()
function
but then you'd have to form a complete event description with at least one
unit mask.
The current logic in libpfm is that if an event has unit masks, then at
least one must be
provided. Phil and I discussed the ability to designate a default unit
masks. In fact,
the logic is implemented in the current libpfm but it is not used, except
maybe on MIPS.
The problem with this is is you have an event A, which supports unit mask
E,F,G, if you
simply pass A, and libpfm picks up A:E (as default), then you don't really
know what you
are measuring with A. Unless we document it somewhere or provide a call to
return the
default such that tool can provide the info. If I recall OProfile uses
default unit masks.


>
> >
> >    - PCL to be handled via specific call:
> > pfm_get_pcl_event_encoding() and it takes struct perf_counter_attr.
> >      PCL definitions are in a dedicated header file.
>
> I assume you'd supply both the event and attributes (including privilege
> level, edge/level etc.) to this function?
>

Yes, e.g, cpu_clk_unhalted:core_p:u=1:k=1


>
> >
> >    - I have renamed pfm_dispatch_events() to pfm_assign_events(),
> > the input is now an argv-style array.
> >      The call still takes some extra parameters which are needed on
> > input (list of unavailable pmcs) and on
> >      output (list of registers to program). I am still not too
> > satisfied about the output structure, it is big. The
> >      alternative would be to dynamically allocate just the number of
> > registers needed for the measurement.
>
> I think as long as you are returning the other strings using allocated
> memory, it probably wouldn't hurt to do it here on these objects too.
>
> >
> > Find attached the current state of pfmlib.h.
>
> Just a nit, but the pfm_assign_events function's structure definitions
> still read "pfmlib_dispatch_{in,out}_t".


Thanks, the whole thing does not even compile at this time.

Let's continue the discussion...
------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to