Re: [perfmon] upcoming perfmon interface changes

Stephane Eranian Mon, 06 Nov 2006 08:18:44 -0800

Phil,

On Thu, Nov 02, 2006 at 12:46:34PM +0100, Philip J. Mucci wrote:
> I think I'm still struggling with semantics of these two and why they
> need to be separate. That along with the limitation that you can do
> certain things before and after loading the context, all seem rather
> unnecessarily limiting to me.
> 
> To clarify a little, the first change, like being able to declare a
> sample buffer at any point (as well as multiplexing set etc...) upon
> create_context time, means that contexts cannot be created until one
> knows exactly what one is going to do with the PMU.
> 
> This is not the case, often a library wants to create a context for
> itself or another process, attach that process, and anything may be done
> to it, adding sets, sampling, etc. The current model requires that all
> such changes follow a unload/context/close/create/context/load/context
> cycle in virtually all paths resulting in changing the usage model of
> the counters. Consider the user, using a performance tool with an
> interactive interface to the counters...you don't know that the user is
> finished 'setting up' things until he or she calls start. That means
> either you defer creation and loading until start or you go through
> unloading and closing.
> 
> On my second point, I really see no circumstances where create runs and
> load doesn't. These two are always used together...currently it seems
> that this only provides some sort of external gate to still be allowed
> to change less things once loaded. (You can really only read and write
> PMD/PMC's after this). Looking through all of pfmon, libpfm and PAPI,
> seems to indicate that if load_context were implicit in create_context,
> the code would be simpler and smaller...if you need to declare PMU
> resources, at create time would be just as useful and require one less
> API call and one less kernel entry point. I know you have explained the
> rationale of these to me numerous times, but I still see the same code
> snippets all through the libraries...create/write/write/load...and I
> can't help but wonder if that can be reduced to one simple kernel call.
> The fact that I do this sequence means that if the user changes how he
> or she uses the counters, I have to do 6 system calls to reuse the
> counters...unload/close/create/write/write/load.
> 
> But this is as much as I want to say...really, if they accept Perfmon2,
> no one will be happier than me. (ok you maybe ;-)
>


Ok, you pinpoint several issues you think are limitiations:

 1/ create/load decoupled

    You are right that all examples in libpfm do create followed by load.
    Yet, if you look closer, you will see that task.c (or task_smpl.c)
    does the create, then fork/exec then load the context. You can decouple
    the two operations. I think this is handy because you can PREPARE your
    context, i.e., allocate, program the PMC/PMD, then when you are ready
    you simply attach. Imagine when you want to attach to an existing program,
    you want to minimize the time the program is stopped. By allowing you
    to prepare the work, and then attach, perfmon offers you a way to minimize
    that time.

    Another thing you can do is batching. you prepare a bunch of identical
    contexts, and you attach them on the fly when a new thread/child process
    is created.

  2/ cannot allocate sampling buffer after create

    It is likely that you know in advance whether you want to simply count or
    collecting profiles. That is selected by the metric programmed into your
    tool.

    When you create the context, you allocate certain system resources, such as
    memory. The sampling buffer requires memory and thus it seems a good place
    to allocate and initialize the buffer.

    People have raised the issue of the number of system calls in perfmon. I do
    not think we could justify yet another call to add a sampling buffer to an
    existing context.

  3/ cannot create/delete set once loaded

    I think this is is clearly more debatable. There are reasons why the 
restriction
    exists today.

    set creation: for each set, it is possible to designate an explicit next, 
i.e., the
                  set to go to after. That means that the next set is not 
necessarily
                  the next up based on the set identification. it is possible 
to create
                  sub-lists.

                  When the context is attached, we pre-compute the link from 
one set to its
                  follower. The loop is sealed until the context is unloaded. 
It makes it
                  fast on switch because we do not need to walk the list to 
find the next
                  set (which may not be the next on up).

    set deletion: need to be unloaded. Otherwise you have to deal with the case 
you are
                  deleting the current set which is annoying especially if 
monitoring is still
                  on in per-thread or system-wide mode.

    I do believe those restrictions could be lifted, and you are welcome to 
take a look at them
    if you think that would help your library.

-- 
-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Re: [perfmon] upcoming perfmon interface changes

Reply via email to