Hi Stefane,
Thanks for the explanations....while I understand that the load/create
split is supposed to make things more efficient, my answer is that it
does not in practice, due to the way the counters are used. You could
say that this is a by-product of PAPI design (or the libpfm test suite),
but it just is the way it is.
As for sampling buffer, I don't agree that the tools knows whether
and/or how much it is going to sample when the context is created. Any
more than a debugger knows what your going to do when you ptrace_attach.
Same goes for multiplexing. The flexibility present in multiplexing can
be narrowed...sublists for example...and next set notification. My
feeling is that this will never be used and both introduce complexity
and restrictions into the underlying code. This is one of the *very few*
cases where I agree with criticism about too much
flexibility/configurability of perfmon. Just because something is
possible (and because we can think up a way to use it), doesn't mean
that the support should be there. Sublists and next-event-set type
semantics could be emulated in user land if someone really wanted
it...furthermore, it's unclear that any real or imagined performance
tools out there will use it. (IMHO) To this end, I would definitely not
support adding a function call, but rather:
deleting sample buffer args to context create
removing event_set entry points
replace all of the above with perfmon_ioctl() for such things.
But again, I'm happy to get what we can in the kernel. 'Tis easier to
delete than to insert, ce n'est pas vrai?
Phil
On Mon, 2006-11-06 at 08:18 -0800, Stephane Eranian wrote:
> Phil,
>
> On Thu, Nov 02, 2006 at 12:46:34PM +0100, Philip J. Mucci wrote:
> > I think I'm still struggling with semantics of these two and why they
> > need to be separate. That along with the limitation that you can do
> > certain things before and after loading the context, all seem rather
> > unnecessarily limiting to me.
> >
> > To clarify a little, the first change, like being able to declare a
> > sample buffer at any point (as well as multiplexing set etc...) upon
> > create_context time, means that contexts cannot be created until one
> > knows exactly what one is going to do with the PMU.
> >
> > This is not the case, often a library wants to create a context for
> > itself or another process, attach that process, and anything may be done
> > to it, adding sets, sampling, etc. The current model requires that all
> > such changes follow a unload/context/close/create/context/load/context
> > cycle in virtually all paths resulting in changing the usage model of
> > the counters. Consider the user, using a performance tool with an
> > interactive interface to the counters...you don't know that the user is
> > finished 'setting up' things until he or she calls start. That means
> > either you defer creation and loading until start or you go through
> > unloading and closing.
> >
> > On my second point, I really see no circumstances where create runs and
> > load doesn't. These two are always used together...currently it seems
> > that this only provides some sort of external gate to still be allowed
> > to change less things once loaded. (You can really only read and write
> > PMD/PMC's after this). Looking through all of pfmon, libpfm and PAPI,
> > seems to indicate that if load_context were implicit in create_context,
> > the code would be simpler and smaller...if you need to declare PMU
> > resources, at create time would be just as useful and require one less
> > API call and one less kernel entry point. I know you have explained the
> > rationale of these to me numerous times, but I still see the same code
> > snippets all through the libraries...create/write/write/load...and I
> > can't help but wonder if that can be reduced to one simple kernel call.
> > The fact that I do this sequence means that if the user changes how he
> > or she uses the counters, I have to do 6 system calls to reuse the
> > counters...unload/close/create/write/write/load.
> >
> > But this is as much as I want to say...really, if they accept Perfmon2,
> > no one will be happier than me. (ok you maybe ;-)
> >
>
> Ok, you pinpoint several issues you think are limitiations:
>
> 1/ create/load decoupled
>
> You are right that all examples in libpfm do create followed by load.
> Yet, if you look closer, you will see that task.c (or task_smpl.c)
> does the create, then fork/exec then load the context. You can decouple
> the two operations. I think this is handy because you can PREPARE your
> context, i.e., allocate, program the PMC/PMD, then when you are ready
> you simply attach. Imagine when you want to attach to an existing program,
> you want to minimize the time the program is stopped. By allowing you
> to prepare the work, and then attach, perfmon offers you a way to minimize
> that time.
>
> Another thing you can do is batching. you prepare a bunch of identical
> contexts, and you attach them on the fly when a new thread/child process
> is created.
>
> 2/ cannot allocate sampling buffer after create
>
> It is likely that you know in advance whether you want to simply count or
> collecting profiles. That is selected by the metric programmed into your
> tool.
>
> When you create the context, you allocate certain system resources, such
> as
> memory. The sampling buffer requires memory and thus it seems a good place
> to allocate and initialize the buffer.
>
> People have raised the issue of the number of system calls in perfmon. I
> do
> not think we could justify yet another call to add a sampling buffer to an
> existing context.
>
> 3/ cannot create/delete set once loaded
>
> I think this is is clearly more debatable. There are reasons why the
> restriction
> exists today.
>
> set creation: for each set, it is possible to designate an explicit next,
> i.e., the
> set to go to after. That means that the next set is not
> necessarily
> the next up based on the set identification. it is possible
> to create
> sub-lists.
>
> When the context is attached, we pre-compute the link from
> one set to its
> follower. The loop is sealed until the context is unloaded.
> It makes it
> fast on switch because we do not need to walk the list to
> find the next
> set (which may not be the next on up).
>
> set deletion: need to be unloaded. Otherwise you have to deal with the
> case you are
> deleting the current set which is annoying especially if
> monitoring is still
> on in per-thread or system-wide mode.
>
> I do believe those restrictions could be lifted, and you are welcome to
> take a look at them
> if you think that would help your library.
>
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/