Yes, a mixture of the two, where appropriate, sounds good to me. By
cutting the interface down by several
syscalls, you might make it more palatable to some of the reviewers.
- Corey
"stephane eranian" <[EMAIL PROTECTED]> wrote on 07/01/2008 12:59:38
PM:
> Corey,
>
> On Tue, Jul 1, 2008 at 6:58 PM, Corey J Ashford <[EMAIL PROTECTED]>
wrote:
> > You might want to add a separate section that details the thinking
about
> > why you don't want to use a single, multiplexed syscall. If you add
this,
> > it could go after you've detailed the session breakdown, and before
you've
> > described the current syscalls. I know this has been an area some of
the
> > LKML folks have picked at before.
> >
>
> That's a good point. I will add a paragraph about this.
> But I was thinking that they would probably not oppose a mixed of the
two.
> several syscalls including one with multiplexing. For instance, I
> could envision
> a pfm_controls() which could be used to start/stop attach/detach and
leave
> the other calls untouched.
>
> > [EMAIL PROTECTED] wrote on 07/01/2008
09:41:36
> > AM:
> >
> >> Hello everyone,
> >>
> >> I intend to send this following description to LKML and a few LKML
> > developers
> >> to try and explain the reasoning behind the current syscall interface
> >> for perfmon2.
> >>
> >> I know there have been a lot of doubts and misunderstandings as to why
> > we need
> >> to many syscalls and how they could be extended. I tried to address
> >> those concerns
> >> here.
> >>
> >> Please feel free to comment, add to it.
> >>
> >> Thanks.
> >>
> >>
> >
>
-----------------------------------------------------------------------------------------------------------------------
> >>
> >> 1) monitoring session breakdown
> >>
> >> A monitoring session can be decomposed into a sequence of fundamental
> >> actions which
> >> are as follows:
> >> - create the session
> >> - program registers
> >> - attach to target thread or CPU
> >> - start monitoring
> >> - stop monitoring
> >> - read results
> >> - detach from thread or CPU
> >> - terminate session
> >>
> >> The order may not necessarily be like shown. For instance, the
> >> programming may happen
> >> after the session has been attached. Obviously, the start/stop
> >> operations may be
> >> repeated before results are read and results can be read multiple
times.
> >>
> >> In the next sections, we examine each action separately.
> >>
> >> 2) session creation
> >>
> >> Perfmon2 supports 2 types of sessions: per-thread or per-CPU (so
> >> called system-wide)
> >>
> >> During the creation of the session, certain attributes are set, they
> >> remain until the
> >> session is terminated. For instance, the per-cpu attribute cannot
> >> be changed.
> >>
> >> During creation, the kernel state to support the session is
> >> allocated and initialized.
> >> No PMU hardware is actually accessed. Permissions to create a
> >> session may be checked.
> >> Resource limits are also validated and memory consumption is
accounted
> > for.
> >>
> >> The software state of the PMU is initialized, i.e., all
> >> configuration registers are
> >> set to a quiescent value. Data registers are initialized to zero
> >> whenever possible.
> >>
> >> Upon return, the kernel returns a unique identifier which is to be
> >> used for all
> >> subsequent actions on the session.
> >>
> >> 3) programming the registers
> >>
> >> Programming of the PMU registers can occur at any time during the
> >> lifetime of a session,
> >> the session does not need to be attached to a thread of CPU.
> >>
> >> It may be necessary to change the settings, e.g., monitor another
> >> event or reset the counts
> >> when sampling at the user level. Thus, the writing of the registers
> >> MUST be decoupled from
> >> the creation of the session.
> >>
> >> Similarly, writing of configuration and data registers must also be
> >> decoupled, as data
> >> registers may be reprogrammed independently of their configuration
> >> registers, like when
> >> sampling for instance.
> >>
> >> The number of registers varies a lot from one PMU to the other. The
> >> relationships between
> >> configuration and data registers can be more complex than just
> >> one-to-one. On most PMU,
> >> writing of the PMU registers requires running at the most privileged
> >> level, i.e., in the
> >> kernel. To amortize the cost of a system call, it is interesting to
> >> be able to program multiple
> >> registers in one call. Thus, it must be possible to pass vector
> >> arguments. Of course,
> >> for security reasons, the system administrator may impose a limit on
> >> how big vectors can
> >> actually be. The advantage is that vector can vary in size and thus
> >> the amount of data
> >> passed between application and kernel can be optimized to be just
> >> the minimal needed.
> >> System call data needs to be copied into the kernel memory space
> >> before it can be used.
> >>
> >> 4) attachment and detachment
> >>
> >> A session can be attached to a kernel-visible thread or a CPU. If
> >> there is attachment,
> >> then it must be possible to detach the session to possibly re-attach
> >> it to another thread
> >> or CPU. Detachment should not require destroying the session.
> >>
> >> There are 3 possibilities for attachment:
> >> - when the session is created
> >> - when the monitoring is activated
> >> - with a dedicated call
> >>
> >> If the attachment is done during the creation of the session, then it
> >> means the target (thread or CPU)
> >> needs to exist at that time. For a cpu-wide session, this means that
> >> the session must be created while
> >> executing on that CPU. This does not seem unreasonable especially on
> >> NUMA systems.
> >>
> >> For a per-thread session however, this is a bit more problematic as
> >> this means it is not possible
> >> to prepare the session and the PMU registers before the thread
> >> exists. When monitoring across fork
> >> and pthread_create, it is important to minimize overhead. Creation of
> >> a session can trigger complex
> >> memory allocations in the kernel. Thus, it may be interesting to
> >> prepare a batch of ready-to-go sessions,
> >> which just need to be attached when the fork or pthread_create
> >> notification arrives.
> >>
> >> If the attachment is coupled with the creation of the session, it
> >> implies that the detachment is coupled
> >> with its destruction, by symmetry. Coupling of detachment with
> >> termination is problematic for both per-thread
> >> and CPU-wide mode. With the former, the termination of a thread is
> >> usually totally asynchronous with the
> >> termination of the session by the monitoring tool. The only case
> >> where they are synchronized is for
> >> self-monitored threads. When a tool is monitoring a thread in another
> >> process, the termination of that thread
> >> will cause the kernel to detach the session. But the session must not
> >> be closed because the tool likely wants
> >> to read the results and also because the session still exists for the
> >> tool. For CPU-wide, there is also an issue
> >> when a monitored CPU is put off-line dynamically. The session would
> >> be detached by the kernel, yet the session would
> >> still be live in the tool whose controlling thread would have been
> >> migrated off of that CPU.
> >>
> >> If the attachment is done when monitoring is activated, then the
> >> detachment is done when monitoring
> >> is deactivated. The following relationships are therefore enforced:
> >>
> >> attached => activated
> >> stopped => detached
> >>
> >> It is expected that start/stop operations could be very frequent for
> >> self-monitored workloads. When used
> >> to monitor small sections of critical code, e.g., loop kernels, it is
> >> important to minimize overhead, thus
> >> the start/stop should be as simple as possible.
> >>
> >> Attaching requires loading the PMU machine state onto the PMU
> >> hardware. Conversely, detaching implies flushing
> >> the PMU state to memory so results can be read even after the
> >> termination of a thread, for instance. Both
> >> operations are expensive due to the high cost of accessing the PMU
> > registers.
> >>
> >> Furthermore, there are certain PMU models, e.g., Intel Itanium, where
> >> it is possible to let user level code
> >> start/stop monitoring with a single instruction. To minimize
> >> overhead, it is very important to allow this
> >> mechanism for self-monitored programs. Yet the session would have to
> >> be attached/detached somehow. With
> >> dedicated attach/detach calls, this can be supported transparently.
> >> One possible work-around with the coupled
> >> calls would be to require a system call to attach the session and do
> >> the initial activation, subsequent
> >> start/stop could use the lightweight instruction. The session would
> >> be stopped and detached with a system call.
> >>
> >> The dedicated attach/detach calls offer a maximum level of
> >> flexibility. The let applications create sessions
> >> in advance or on-demand. The actions on the session, start/stop and
> >> attach/detach, are perfectly symmetrical.
> >> The termination of the monitored target can cause its detachment, but
> >> the session remains accessible. Issuing
> >> of the detach call on a session already detached by the kernel is
> > harmless.
> >>
> >> The cost of start/stop is not impacted.
> >>
> >> The following properties are enforced:
> >> upon attachment => monitoring stopped
> >> during detachment => monitoring stopped
> >>
> >> 5) start and stop
> >>
> >> It must be possible for an application to start and stop monitoring
> >> at will and at any moment.
> >> Start and stop can be called very frequently and not just at the
> >> beginning and end of a session.
> >> This is especially likely for self-monitored threads where it is
> >> customary to monitor execution of
> >> only one function or loop. Thus those operations can be on the
> >> critical path and they must therefore
> >> by as lightweight as possible. See the discussion in the section
> >> about attachment and detachment.
> >>
> >>
> >> 6) reading the results
> >>
> >> The results are extracted by reading the PMU registers containing
> >> data (as opposed to configuration).
> >> The number of registers of interest can vary based on the PMU model,
> >> the type of measurement, the events
> >> measured.
> >>
> >> Reading can occur at regular interval, e.g., time-based user level
> >> sampling, and can therefore be on the
> >> critical path. Thus it must as lightweight as possible. Given that
> >> the cost of dominated by the latency
> >> of accessing the PMU registers, it is important to only read the
> >> registers that are used. Thus, the call
> >> must provide vector arguments just like for the calls to program the
> > PMU.
> >>
> >> It must be possible to read the registers while the session is
> >> detached but also when it is attached to a
> >> thread or CPU.
> >>
> >> 7) termination
> >>
> >> Termination of a session means all the associated resources are
> >> either released to the free pool or destroyed.
> >> After termination, no state remains. Termination implies, stopping
> >> monitoring and detaching the session if
> >> necessary.
> >>
> >> For the purpose of termination, one has to differentiate between the
> >> monitored entity and the controlling entity.
> >> When a tool monitors a thread in another process, all the threads
> >> from the tool are controlling entities, and the
> >> monitored thread is the monitored entity. Any entity can vanish at
any
> > time.
> >>
> >> If the monitored entity terminates voluntarily, i.e., normal exit, or
> >> involuntarily, e.g., core dump, the kernel
> >> simply detaches the session but it is not destroyed.
> >>
> >> Until the last controlling entity disappears, the session remains
> > accessible.
> >>
> >> There are situations where all the controlling entities disappear
> >> before the monitored entity. In this case, the
> >> session becomes useless, results cannot be extracted, thus the
> >> session enters the zombie state. It will
> >> eventually be detached and its resources will be reclaimed by the
> >> kernel, i.e., the session will be terminated.
> >>
> >> 8) extensibility
> >>
> >> There is already a vast diversity with existing PMU models, this is
> >> unlikely to change, quite to the contrary
> >> it is envisioned that the PMU will become a true valid-add and that
> >> vendors will therefore try to differentiate
> >> one from the other. Moreover, the PMU will remain closely tied to
> >> the underlying micro-architecture. Therefore,
> >> it is very important to ensure that the monitoring interface will be
> >> able to adapt easily to future PMU models
> >> and their extended features, i.e., what is offered beyond counting
> > events.
> >>
> >> It is important to realize that extensibility is not limited to
> >> supporting more PMU registers. It also includes
> >> supporting advanced sampling features or socket-level PMUs as
> >> opposed to just core-level PMUs.
> >>
> >> It may be necessary to extend the system calls with new generic or
> >> architecture specific parameters, and this
> >> without simply adding new system calls.
> >>
> >> 9) current perfmon2 interface
> >>
> >> The perfmon2 interface design is guided by the principles described
> >> in the previous sections.
> >> We now explain each call is details.
> >>
> >>
> >> a) session creation
> >>
> >> int pfm_create_session(struct pfarg_ctx *ctx, char *smpl_name,
> >> void *smpl_arg, size_t arg_size);
> >>
> >> The function creates the perfmon session and returns a file
> >> descriptor used to manipulate the session
> >> thereafter.
> >>
> >> The calls takes several parameters which are as follows:
> >> - pfarg_ctx: encapsulates all session parameters (see below)
> >> - smpl_name: used when sampling to designate which format to
use
> >> - smpl_arg: point to format-specific arguments
> >> - smpl_size: size of the structure passed in smpl_arg
> >>
> >> The pfarg_ctx structure is defined as follows:
> >> - flags: generic and arch-specific flags for the session
> >> - reserved: reserved for future extensions
> >>
> >> To provide for future extensions, the pfarg_ctx structure
> >> contains reserved fields. Reserved fields
> >> must be zeroed.
> >>
> >> To create a per-cpu session, the value PFM_CTX_SYSTEM_WIDE must
> >> be passed in flags.
> >>
> >> When in-kernel sampling is not used smpl_name, smpl_arg, arg_size
> >> must be 0.
> >>
> >> b) programming the registers
> >>
> >> int pfm_write_pmcs(int fd, struct pfarg_pmc *pmcs, int n);
> >> int pfm_write_pmds(int fd, struct pfarg_pmd *pmds, int n);
> >>
> >> The calls are provided to program the configuration and data
> >> registers respectively. The parameters are
> >> as follows:
> >> - fd: file descriptor identifying the session
> >> - pmc: pointer to parg_pmc structures
> >> - pmd: pointer to parg_pmd structures
> >> - n : number of elements in the pmc or pmd vector
> >>
> >> It is possible to pass vector of parg_pmc or pfarg_pmd registers.
> >> The minimal size is 1, maximum size is
> >> determined by system administrator.
> >>
> >> The pfarg_pmc structure is defined as follows:
> >> struct pfarg_pmc {
> >> u16 reg_num;
> >> u64 reg_value;
> >> u64 reserved[];
> >> };
> >>
> >> The pfarg_pmd structure is defined as follows:
> >> struct pfarg_pmd {
> >> u16 reg_num;
> >> u64 reg_value;
> >> u64 reserved[];
> >> };
> >>
> >> Although both structures are currently identical, they will
> >> differ as more functionalities are added so better
> >> to create two versions from the start.
> >>
> >> Provisions for extensions are provided by the reserved field in
> >> each structure.
> >>
> >>
> >> c) attachment and detachment
> >>
> >> int pfm_load_context(int fd, struct pfarg_load *ld);
> >> int pfm_unload_context(int fd);
> >>
> >>
> >> The session is identified by the file descriptor, fd.
> >>
> >> To attach, the targeted thread or CPU must be provided. For
> >> extensibility purposes, the target is passed in
> >> in structure which is defined as follows:
> >> struct pfarg_load {
> >> u32 target;
> >> u64 reserved[];
> >> };
> >> In per-thread mode, the target field must be set to the kernel
> >> thread identification (gettid()).
> >>
> >> In per-cpu mode, the target field must be set to the logical CPU
> >> identification as seen by the kernel.
> >> Furthermore, the caller must be running on the CPU to monitor
> >> otherwise the call fails.
> >>
> >> Extensions can be implemented using the reserved field.
> >>
> >>
> >> d) start and stop
> >>
> >> int pfm_start(int fd);
> >> int pfm_stop(int fd);
> >>
> >> The session is identified by the file descriptor fd.
> >>
> >> Currently no other parameters are supported for those calls.
> >>
> >>
> >> e) reading results
> >>
> >> int pfm_read_pmds(int fd, struct pfarg_pmd *pmds, int n);
> >>
> >>
> >> The session is identified by the file descriptor fd.
> >>
> >> Just like for programming the registers, it is possible to pass
> >> vectors of structures in pmds. The number
> >> of elements is passed in n.
> >>
> >>
> >> f) termination
> >>
> >> int close(fd);
> >>
> >> To terminate a session, the file descriptor has to be closed. The
> >> semantics of file descriptor sharing
> >> applies, so if another reference to the session, i.e., another
> >> file descriptor exists, the session will
> >> only be effectively destroyed, once that reference disappears.
> >>
> >> Of course, the kernel does close all file descriptor on process
> >> termination, thus the associated sessions
> >> will eventually be destroyed.
> >>
> >> In per-cpu mode, it is not necessary, though recommended, to be
> >> on the monitored CPU to issue this call.
> >>
> >>
> >
-------------------------------------------------------------------------
> >> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
> >> Studies have shown that voting for your favorite open source project,
> >> along with a healthy diet, reduces your potential for chronic lameness
> >> and boredom. Vote Now at http://www.sourceforge.net/community/cca08
> >> _______________________________________________
> >> perfmon2-devel mailing list
> >> perfmon2-devel@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
> >
> >
-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel