hi Stephane,
Thanks for the extensive feedback! Your numerous comments cover
twenty sub-topics, so we've tabulated the summary of Peter's and my
replies, to give a quick overview:
-
Topic/question you raised
> I/ General API comments
>
> 1/ System calls
>
> * ioctl()
>
>You have defined 5 ioctls() so far to operate on an existing
>event. I was under the impression that ioctl() should not be
>used except for drivers.
>
> How do you justify your usage of ioctl() in this context?
We can ce
> 2/ Grouping
>
> By design, an event can only be part of one group at a time.
> Events in a group are guaranteed to be active on the PMU at the
> same time. That means a group cannot have more events than there
> are available counters on the PMU. Tools may want to know the
> number of counters av
> 3/ Multiplexing and system-wide
>
> Multiplexing is time-based and it is hooked into the timer tick.
> At every tick, the kernel tries to schedule another set of event
> groups.
>
> In tickless kernels if a CPU is idle, no timer tick is generated,
> therefore no multiplexing occurs. This is incor
> 4/ Controlling group multiplexing
>
> Although multiplexing is exposed to users via the timing
> information, events may not necessarily be grouped at random by
> tools. Groups may not be ordered at random either.
>
> I know of tools which craft the sequence of groups carefully such
> that relate
> 5/ Mmaped count
>
> It is possible to read counts directly from user space for
> self-monitoring threads. This leverages a HW capability present on
> some processors. On X86, this is possible via RDPMC.
>
> The full 64-bit count is constructed by combining the hardware
> value extracted with an a
> 6/ Group scheduling
>
> Looking at the existing code, it seems to me there is a risk of
> starvation for groups, i.e., groups never scheduled on the PMU.
>
> My understanding of the scheduling algorithm is:
>
> - first try to �schedule pinned groups. If a pinned group
> fails, put it in error m
> 7/ Group validity checking
>
> At the user level, an application is only concerned with events
> and grouping of those events. The assignment logic is performed by
> the kernel.
>
> For a group to be scheduled, all its events must be compatible
> with each other, otherwise the group will never be
> 8/ Generalized cache events
>
> In recent days, you have added support for what you call
> 'generalized cache events'.
>
> The log defines:
> new event type: PERF_TYPE_HW_CACHE
>
> This is a 3-dimensional space:
> { L1-D, L1-I, L2, ITLB, DTLB, BPU } x
> { load, store, prefetch } x
> { acces
> 9/ Group reading
>
> It is possible to start/stop an event group simply via ioctl() on
> the group leader. However, it is not possible to read all the
> counts with a single with a single read() system call. That seems
> odd. Furhermore, I believe you want reads to be as atomic as
> possible.
If
> 10/ Event buffer minimal useful size
>
> As it stands, the buffer header occupies the first page, even
> though the buffer header struct is 32-byte long. That's a lot of
> precious RLIMIT_MEMLOCK memory wasted.
>
> The actual buffer (data) starts at the next page (from builtin-top.c):
>
> static
> 11/ Missing definitions for generic hardware events
>
> As soon as you define generic events, you need to provide a clear
> and precise definition at to what they measure. This is crucial to
> make them useful. I have not seen such a definition yet.
Do you mean their names aren't clear enough? :
> II/ X86 comments
>
> 1/ Fixed counters on Intel
>
> You cannot simply fall back to generic counters if you cannot find
> a fixed counter. There are model-specific bugs, for instance
> UNHALTED_REFERENCE_CYCLES (0x013c), does not measure the same
> thing on Nehalem when it is used in fixed counte
> 2/ Event knowledge missing
>
> There are constraints on events in Intel processors. Different
> constraints do exist on AMD64 processors, especially with
> uncore-releated events.
You raise the issue of uncore events in IV.1, but let us reply here
primarily.
Un-core counters and events seem to
> III/ Requests
>
> 1/ Sampling period randomization
>
> It is our experience (on Itanium, for instance), that for certain
> sampling measurements, it is beneficial to randomize the sampling
> period a bit. This is in particular the case when sampling on an
> event that happens very frequently and
> IV/ Open questions
>
> 1/ Support for model-specific uncore PMU monitoring capabilities
>
> Recent processors have multiple PMUs. Typically one per core and
> but also one at the socket level, e.g., Intel Nehalem. It is
> expected that this API will provide access to these PMU as well.
>
> It see
> 2/ Features impacting all counters
>
> On some PMU models, e.g., Itanium, they are certain features which
> have an influence on all counters that are active. For instance,
> there is a way to restrict monitoring to a range of continuous
> code or data addresses using both some PMU registers and
> 3/ AMD IBS
>
> How is AMD IBS going to be implemented?
>
> IBS has two separate sets of registers. One to capture fetch
> related data and another one to capture instruction execution
> data. For each, there is one config register but multiple data
> registers. In each mode, there is a specific s
> 4/ Intel PEBS
>
> Since Netburst-based processors, Intel PMUs support a hardware
> sampling buffer mechanism called PEBS.
>
> PEBS really became useful with Nehalem.
>
> Not all events support PEBS. Up until Nehalem, only one counter
> supported PEBS (PMC0). The format of the hardware buffer has
> 5/ Intel Last Branch Record (LBR)
>
> Intel processors since Netburst have a cyclic buffer hosted in
> registers which can record taken branches. Each taken branch is
> stored into a pair of LBR registers (source, destination). Up
> until Nehalem, there was not filtering capabilities for LBR. LBR
On Mon, 2009-06-22 at 14:25 +0200, stephane eranian wrote:
> On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar wrote:
> >> 5/ Mmaped count
> >>
> >> It is possible to read counts directly from user space for
> >> self-monitoring threads. This leverages a HW capability present on
> >> some processors. On
On Mon, Jun 22, 2009 at 2:35 PM, Peter Zijlstra wrote:
> On Mon, 2009-06-22 at 14:25 +0200, stephane eranian wrote:
>> On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar wrote:
>> >> 5/ Mmaped count
>> >>
>> >> It is possible to read counts directly from user space for
>> >> self-monitoring threads. This
On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar wrote:
>> 5/ Mmaped count
>>
>> It is possible to read counts directly from user space for
>> self-monitoring threads. This leverages a HW capability present on
>> some processors. On X86, this is possible via RDPMC.
>>
>> The full 64-bit count is constr
Ingo Molnar wrote:
>> 3/ AMD IBS
>>
>> How is AMD IBS going to be implemented?
>>
>> IBS has two separate sets of registers. One to capture fetch
>> related data and another one to capture instruction execution
>> data. For each, there is one config register but multiple data
>> registers. In eac
On Mon, Jun 22, 2009 at 02:00:52PM +0200, Ingo Molnar wrote:
> Having said that, PEBS is a hardware sampling feature that is
> definitely saner than AMD's IBS. There's two immediate incremental
> uses of it in perfcounters:
>
> - it makes flat sampling lower overhead by avoiding an NMI for all
>
On Mon, Jun 22, 2009 at 1:57 PM, Ingo Molnar wrote:
>> II/ X86 comments
>>
>> 1/ Fixed counters on Intel
>>
>> You cannot simply fall back to generic counters if you cannot find
>> a fixed counter. There are model-specific bugs, for instance
>> UNHALTED_REFERENCE_CYCLES (0x013c), does not measure
On Mon, 2009-06-22 at 14:54 +0200, stephane eranian wrote:
> On Mon, Jun 22, 2009 at 2:35 PM, Peter Zijlstra wrote:
> > On Mon, 2009-06-22 at 14:25 +0200, stephane eranian wrote:
> >> On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar wrote:
> >> >> 5/ Mmaped count
> >> >>
> >> >> It is possible to read
On Mon, Jun 22, 2009 at 01:49:31PM +0200, Ingo Molnar wrote:
> > How do you justify your usage of ioctl() in this context?
>
> We can certainly do a separate sys_perf_counter_ctrl() syscall - and
> we will do that if people think the extra syscall slot is worth it
> in this case.
>
> The (mild) c
On Mon, Jun 22, 2009 at 1:56 PM, Ingo Molnar wrote:
>> 11/ Missing definitions for generic hardware events
>>
>> As soon as you define generic events, you need to provide a clear
>> and precise definition at to what they measure. This is crucial to
>> make them useful. I have not seen such a defini
Hello,
I have finally released the perfmon2 kernel patch for kernel v2.6.29. I know
it is off by one kernel version, but a lot of things have been happening
and I have been very busy. As a consequence, 2.6.29 does not include everything
I wanted it to have, especially PEBS for NHM. That will be fo
On Monday 22 June 2009, Ingo Molnar wrote:
> There is another, more theoretical argument in favor of
> sys_perf_counter_chattr(): it is quite conceivable that as usage of
> perfcounters expands we want to change more and more attributes. So
> even though right now the ioctl just about manages to
* Christoph Hellwig wrote:
> On Mon, Jun 22, 2009 at 01:49:31PM +0200, Ingo Molnar wrote:
> > > How do you justify your usage of ioctl() in this context?
> >
> > We can certainly do a separate sys_perf_counter_ctrl() syscall -
> > and we will do that if people think the extra syscall slot is
Rob Fowler wrote on 06/22/2009 09:08:34 AM:
>
>
> Ingo Molnar wrote:
> >> 3/ AMD IBS
> >>
> >> How is AMD IBS going to be implemented?
> >>
> >> IBS has two separate sets of registers. One to capture fetch
> >> related data and another one to capture instruction execution
> >> data. For each, the
On Mon, Jun 22, 2009 at 2:00 PM, Ingo Molnar wrote:
>> 3/ AMD IBS
>>
>> How is AMD IBS going to be implemented?
>>
>> IBS has two separate sets of registers. One to capture fetch
>> related data and another one to capture instruction execution
>> data. For each, there is one config register but mul
On Mon, Jun 22, 2009 at 1:50 PM, Ingo Molnar wrote:
>> 2/ Grouping
>>
>> By design, an event can only be part of one group at a time.
As I read this again, another question came up. Is the statement
above also true for the group leader?
>> Events in a group are guaranteed to be active on the PMU
On Mon, Jun 22, 2009 at 2:01 PM, Ingo Molnar wrote:
>> 5/ Intel Last Branch Record (LBR)
>>
>> Intel processors since Netburst have a cyclic buffer hosted in
>> registers which can record taken branches. Each taken branch is
>> stored into a pair of LBR registers (source, destination). Up
>> until
Ingo Molnar wrote:
>> 2/ Grouping
>>
>> By design, an event can only be part of one group at a time.
>> Events in a group are guaranteed to be active on the PMU at the
>> same time. That means a group cannot have more events than there
>> are available counters on the PMU. Tools may want to know
stephane eranian wrote:
> On Mon, Jun 22, 2009 at 1:50 PM, Ingo Molnar wrote:
>>> 2/ Grouping
>>>
>>> By design, an event can only be part of one group at a time.
>
> As I read this again, another question came up. Is the statement
> above also true for the group leader?
>
>
>>> Events in a grou
stephane eranian writes:
> I don't see where you clear that field on x86.
> Looks like it comes from hwc->idx. I suspect you need
> to do something in x86_pmu_disable() to be symmetrical
> with x86_pmu_enable().
>
> I suspect something similar needs to be done on Power.
power_pmu_disable() alrea
Peter Zijlstra writes:
> I think we would have to add that do the data page,.. something like the
> below?
>
> Paulus?
>
> ---
> Index: linux-2.6/include/linux/perf_counter.h
> ===
> --- linux-2.6.orig/include/linux/perf_counter.h
>
stephane eranian writes:
> Unless you tell me that pc->index is marked invalid (0) when the
> event is not scheduled. I don't see how you can avoid reading
> the wrong value. I am assuming that is the event is not scheduled
> lock remains constant.
That's what happens; when pc->index == 0, the ev
Ingo Molnar writes:
> > 2/ Grouping
> >
> > By design, an event can only be part of one group at a time.
To clarify this statement of Stephane's, a _counter_ can only be in
one group. You can have multiple counters counting the same _event_
and those counters can (obviously) be in different grou
On Tue, 2009-06-23 at 10:39 +1000, Paul Mackerras wrote:
> Peter Zijlstra writes:
>
> > I think we would have to add that do the data page,.. something like the
> > below?
> >
> > Paulus?
> >
> > ---
> > Index: linux-2.6/include/linux/perf_counter.h
> > ==
On Mon, 2009-06-22 at 10:08 -0400, Rob Fowler wrote:
> Ingo Molnar wrote:
> >> 3/ AMD IBS
> >>
> >> How is AMD IBS going to be implemented?
> >>
> >> IBS has two separate sets of registers. One to capture fetch
> >> related data and another one to capture instruction execution
> >> data. For each,
44 matches
Mail list logo