Phil,
On Thu, Nov 02, 2006 at 12:46:34PM +0100, Philip J. Mucci wrote:
> I think I'm still struggling with semantics of these two and why they
> need to be separate. That along with the limitation that you can do
> certain things before and after loading the context, all seem rather
> unnecessarily limiting to me.
>
> To clarify a little, the first change, like being able to declare a
> sample buffer at any point (as well as multiplexing set etc...) upon
> create_context time, means that contexts cannot be created until one
> knows exactly what one is going to do with the PMU.
>
> This is not the case, often a library wants to create a context for
> itself or another process, attach that process, and anything may be done
> to it, adding sets, sampling, etc. The current model requires that all
> such changes follow a unload/context/close/create/context/load/context
> cycle in virtually all paths resulting in changing the usage model of
> the counters. Consider the user, using a performance tool with an
> interactive interface to the counters...you don't know that the user is
> finished 'setting up' things until he or she calls start. That means
> either you defer creation and loading until start or you go through
> unloading and closing.
>
> On my second point, I really see no circumstances where create runs and
> load doesn't. These two are always used together...currently it seems
> that this only provides some sort of external gate to still be allowed
> to change less things once loaded. (You can really only read and write
> PMD/PMC's after this). Looking through all of pfmon, libpfm and PAPI,
> seems to indicate that if load_context were implicit in create_context,
> the code would be simpler and smaller...if you need to declare PMU
> resources, at create time would be just as useful and require one less
> API call and one less kernel entry point. I know you have explained the
> rationale of these to me numerous times, but I still see the same code
> snippets all through the libraries...create/write/write/load...and I
> can't help but wonder if that can be reduced to one simple kernel call.
> The fact that I do this sequence means that if the user changes how he
> or she uses the counters, I have to do 6 system calls to reuse the
> counters...unload/close/create/write/write/load.
>
> But this is as much as I want to say...really, if they accept Perfmon2,
> no one will be happier than me. (ok you maybe ;-)
>
Ok, you pinpoint several issues you think are limitiations:
1/ create/load decoupled
You are right that all examples in libpfm do create followed by load.
Yet, if you look closer, you will see that task.c (or task_smpl.c)
does the create, then fork/exec then load the context. You can decouple
the two operations. I think this is handy because you can PREPARE your
context, i.e., allocate, program the PMC/PMD, then when you are ready
you simply attach. Imagine when you want to attach to an existing program,
you want to minimize the time the program is stopped. By allowing you
to prepare the work, and then attach, perfmon offers you a way to minimize
that time.
Another thing you can do is batching. you prepare a bunch of identical
contexts, and you attach them on the fly when a new thread/child process
is created.
2/ cannot allocate sampling buffer after create
It is likely that you know in advance whether you want to simply count or
collecting profiles. That is selected by the metric programmed into your
tool.
When you create the context, you allocate certain system resources, such as
memory. The sampling buffer requires memory and thus it seems a good place
to allocate and initialize the buffer.
People have raised the issue of the number of system calls in perfmon. I do
not think we could justify yet another call to add a sampling buffer to an
existing context.
3/ cannot create/delete set once loaded
I think this is is clearly more debatable. There are reasons why the
restriction
exists today.
set creation: for each set, it is possible to designate an explicit next,
i.e., the
set to go to after. That means that the next set is not
necessarily
the next up based on the set identification. it is possible
to create
sub-lists.
When the context is attached, we pre-compute the link from
one set to its
follower. The loop is sealed until the context is unloaded.
It makes it
fast on switch because we do not need to walk the list to
find the next
set (which may not be the next on up).
set deletion: need to be unloaded. Otherwise you have to deal with the case
you are
deleting the current set which is annoying especially if
monitoring is still
on in per-thread or system-wide mode.
I do believe those restrictions could be lifted, and you are welcome to
take a look at them
if you think that would help your library.
--
-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/