Re: [perfmon2] v2 of comments on Performance Counters for Linux (PCL)

2009-06-22 Thread Ingo Molnar
hi Stephane, Thanks for the extensive feedback! Your numerous comments cover twenty sub-topics, so we've tabulated the summary of Peter's and my replies, to give a quick overview: - Topic/question you raised

Re: [perfmon2] I.1 - System calls - ioctl

2009-06-22 Thread Ingo Molnar
> I/ General API comments > > 1/ System calls > > * ioctl() > >You have defined 5 ioctls() so far to operate on an existing >event. I was under the impression that ioctl() should not be >used except for drivers. > > How do you justify your usage of ioctl() in this context? We can ce

Re: [perfmon2] I.2 - Grouping

2009-06-22 Thread Ingo Molnar
> 2/ Grouping > > By design, an event can only be part of one group at a time. > Events in a group are guaranteed to be active on the PMU at the > same time. That means a group cannot have more events than there > are available counters on the PMU. Tools may want to know the > number of counters av

Re: [perfmon2] I.3 - Multiplexing and system-wide

2009-06-22 Thread Ingo Molnar
> 3/ Multiplexing and system-wide > > Multiplexing is time-based and it is hooked into the timer tick. > At every tick, the kernel tries to schedule another set of event > groups. > > In tickless kernels if a CPU is idle, no timer tick is generated, > therefore no multiplexing occurs. This is incor

Re: [perfmon2] I.4 - Controlling group multiplexing

2009-06-22 Thread Ingo Molnar
> 4/ Controlling group multiplexing > > Although multiplexing is exposed to users via the timing > information, events may not necessarily be grouped at random by > tools. Groups may not be ordered at random either. > > I know of tools which craft the sequence of groups carefully such > that relate

Re: [perfmon2] I.5 - Mmaped count

2009-06-22 Thread Ingo Molnar
> 5/ Mmaped count > > It is possible to read counts directly from user space for > self-monitoring threads. This leverages a HW capability present on > some processors. On X86, this is possible via RDPMC. > > The full 64-bit count is constructed by combining the hardware > value extracted with an a

Re: [perfmon2] I.6 - Group scheduling

2009-06-22 Thread Ingo Molnar
> 6/ Group scheduling > > Looking at the existing code, it seems to me there is a risk of > starvation for groups, i.e., groups never scheduled on the PMU. > > My understanding of the scheduling algorithm is: > > - first try to �schedule pinned groups. If a pinned group > fails, put it in error m

Re: [perfmon2] I.7 - Group validity checking

2009-06-22 Thread Ingo Molnar
> 7/ Group validity checking > > At the user level, an application is only concerned with events > and grouping of those events. The assignment logic is performed by > the kernel. > > For a group to be scheduled, all its events must be compatible > with each other, otherwise the group will never be

Re: [perfmon2] I.8 - Generalized cache events

2009-06-22 Thread Ingo Molnar
> 8/ Generalized cache events > > In recent days, you have added support for what you call > 'generalized cache events'. > > The log defines: > new event type: PERF_TYPE_HW_CACHE > > This is a 3-dimensional space: > { L1-D, L1-I, L2, ITLB, DTLB, BPU } x > { load, store, prefetch } x > { acces

Re: [perfmon2] I.9 - Group reading

2009-06-22 Thread Ingo Molnar
> 9/ Group reading > > It is possible to start/stop an event group simply via ioctl() on > the group leader. However, it is not possible to read all the > counts with a single with a single read() system call. That seems > odd. Furhermore, I believe you want reads to be as atomic as > possible. If

Re: [perfmon2] I.10 - Event buffer minimal useful size

2009-06-22 Thread Ingo Molnar
> 10/ Event buffer minimal useful size > > As it stands, the buffer header occupies the first page, even > though the buffer header struct is 32-byte long. That's a lot of > precious RLIMIT_MEMLOCK memory wasted. > > The actual buffer (data) starts at the next page (from builtin-top.c): > > static

Re: [perfmon2] I.11 - Missing definitions for generic events

2009-06-22 Thread Ingo Molnar
> 11/ Missing definitions for generic hardware events > > As soon as you define generic events, you need to provide a clear > and precise definition at to what they measure. This is crucial to > make them useful. I have not seen such a definition yet. Do you mean their names aren't clear enough? :

Re: [perfmon2] II.1 - Fixed counters on Intel

2009-06-22 Thread Ingo Molnar
> II/ X86 comments > > 1/ Fixed counters on Intel > > You cannot simply fall back to generic counters if you cannot find > a fixed counter. There are model-specific bugs, for instance > UNHALTED_REFERENCE_CYCLES (0x013c), does not measure the same > thing on Nehalem when it is used in fixed counte

Re: [perfmon2] II.2 - Event knowledge missing

2009-06-22 Thread Ingo Molnar
> 2/ Event knowledge missing > > There are constraints on events in Intel processors. Different > constraints do exist on AMD64 processors, especially with > uncore-releated events. You raise the issue of uncore events in IV.1, but let us reply here primarily. Un-core counters and events seem to

Re: [perfmon2] III.1 - Sampling period randomization

2009-06-22 Thread Ingo Molnar
> III/ Requests > > 1/ Sampling period randomization > > It is our experience (on Itanium, for instance), that for certain > sampling measurements, it is beneficial to randomize the sampling > period a bit. This is in particular the case when sampling on an > event that happens very frequently and

Re: [perfmon2] IV.1 - Support for model-specific uncore PMU

2009-06-22 Thread Ingo Molnar
> IV/ Open questions > > 1/ Support for model-specific uncore PMU monitoring capabilities > > Recent processors have multiple PMUs. Typically one per core and > but also one at the socket level, e.g., Intel Nehalem. It is > expected that this API will provide access to these PMU as well. > > It see

Re: [perfmon2] IV.2 - Features impacting all counters

2009-06-22 Thread Ingo Molnar
> 2/ Features impacting all counters > > On some PMU models, e.g., Itanium, they are certain features which > have an influence on all counters that are active. For instance, > there is a way to restrict monitoring to a range of continuous > code or data addresses using both some PMU registers and

Re: [perfmon2] IV.3 - AMD IBS

2009-06-22 Thread Ingo Molnar
> 3/ AMD IBS > > How is AMD IBS going to be implemented? > > IBS has two separate sets of registers. One to capture fetch > related data and another one to capture instruction execution > data. For each, there is one config register but multiple data > registers. In each mode, there is a specific s

Re: [perfmon2] IV.4 - Intel PEBS

2009-06-22 Thread Ingo Molnar
> 4/ Intel PEBS > > Since Netburst-based processors, Intel PMUs support a hardware > sampling buffer mechanism called PEBS. > > PEBS really became useful with Nehalem. > > Not all events support PEBS. Up until Nehalem, only one counter > supported PEBS (PMC0). The format of the hardware buffer has

Re: [perfmon2] IV.5 - Intel Last Branch Record (LBR)

2009-06-22 Thread Ingo Molnar
> 5/ Intel Last Branch Record (LBR) > > Intel processors since Netburst have a cyclic buffer hosted in > registers which can record taken branches. Each taken branch is > stored into a pair of LBR registers (source, destination). Up > until Nehalem, there was not filtering capabilities for LBR. LBR

Re: [perfmon2] I.5 - Mmaped count

2009-06-22 Thread Peter Zijlstra
On Mon, 2009-06-22 at 14:25 +0200, stephane eranian wrote: > On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar wrote: > >> 5/ Mmaped count > >> > >> It is possible to read counts directly from user space for > >> self-monitoring threads. This leverages a HW capability present on > >> some processors. On

Re: [perfmon2] I.5 - Mmaped count

2009-06-22 Thread stephane eranian
On Mon, Jun 22, 2009 at 2:35 PM, Peter Zijlstra wrote: > On Mon, 2009-06-22 at 14:25 +0200, stephane eranian wrote: >> On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar wrote: >> >> 5/ Mmaped count >> >> >> >> It is possible to read counts directly from user space for >> >> self-monitoring threads. This

Re: [perfmon2] I.5 - Mmaped count

2009-06-22 Thread stephane eranian
On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar wrote: >> 5/ Mmaped count >> >> It is possible to read counts directly from user space for >> self-monitoring threads. This leverages a HW capability present on >> some processors. On X86, this is possible via RDPMC. >> >> The full 64-bit count is constr

Re: [perfmon2] IV.3 - AMD IBS

2009-06-22 Thread Rob Fowler
Ingo Molnar wrote: >> 3/ AMD IBS >> >> How is AMD IBS going to be implemented? >> >> IBS has two separate sets of registers. One to capture fetch >> related data and another one to capture instruction execution >> data. For each, there is one config register but multiple data >> registers. In eac

Re: [perfmon2] IV.4 - Intel PEBS

2009-06-22 Thread Andi Kleen
On Mon, Jun 22, 2009 at 02:00:52PM +0200, Ingo Molnar wrote: > Having said that, PEBS is a hardware sampling feature that is > definitely saner than AMD's IBS. There's two immediate incremental > uses of it in perfcounters: > > - it makes flat sampling lower overhead by avoiding an NMI for all >

Re: [perfmon2] II.1 - Fixed counters on Intel

2009-06-22 Thread stephane eranian
On Mon, Jun 22, 2009 at 1:57 PM, Ingo Molnar wrote: >> II/ X86 comments >> >>  1/ Fixed counters on Intel >> >> You cannot simply fall back to generic counters if you cannot find >> a fixed counter. There are model-specific bugs, for instance >> UNHALTED_REFERENCE_CYCLES (0x013c), does not measure

Re: [perfmon2] I.5 - Mmaped count

2009-06-22 Thread Peter Zijlstra
On Mon, 2009-06-22 at 14:54 +0200, stephane eranian wrote: > On Mon, Jun 22, 2009 at 2:35 PM, Peter Zijlstra wrote: > > On Mon, 2009-06-22 at 14:25 +0200, stephane eranian wrote: > >> On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar wrote: > >> >> 5/ Mmaped count > >> >> > >> >> It is possible to read

Re: [perfmon2] I.1 - System calls - ioctl

2009-06-22 Thread Christoph Hellwig
On Mon, Jun 22, 2009 at 01:49:31PM +0200, Ingo Molnar wrote: > > How do you justify your usage of ioctl() in this context? > > We can certainly do a separate sys_perf_counter_ctrl() syscall - and > we will do that if people think the extra syscall slot is worth it > in this case. > > The (mild) c

Re: [perfmon2] I.11 - Missing definitions for generic events

2009-06-22 Thread stephane eranian
On Mon, Jun 22, 2009 at 1:56 PM, Ingo Molnar wrote: >> 11/ Missing definitions for generic hardware events >> >> As soon as you define generic events, you need to provide a clear >> and precise definition at to what they measure. This is crucial to >> make them useful. I have not seen such a defini

[perfmon2] 2.6.29 perfmon2 available

2009-06-22 Thread stephane eranian
Hello, I have finally released the perfmon2 kernel patch for kernel v2.6.29. I know it is off by one kernel version, but a lot of things have been happening and I have been very busy. As a consequence, 2.6.29 does not include everything I wanted it to have, especially PEBS for NHM. That will be fo

Re: [perfmon2] I.1 - System calls - ioctl

2009-06-22 Thread Arnd Bergmann
On Monday 22 June 2009, Ingo Molnar wrote: > There is another, more theoretical argument in favor of > sys_perf_counter_chattr(): it is quite conceivable that as usage of > perfcounters expands we want to change more and more attributes. So > even though right now the ioctl just about manages to

Re: [perfmon2] I.1 - System calls - ioctl

2009-06-22 Thread Ingo Molnar
* Christoph Hellwig wrote: > On Mon, Jun 22, 2009 at 01:49:31PM +0200, Ingo Molnar wrote: > > > How do you justify your usage of ioctl() in this context? > > > > We can certainly do a separate sys_perf_counter_ctrl() syscall - > > and we will do that if people think the extra syscall slot is

Re: [perfmon2] IV.3 - AMD IBS

2009-06-22 Thread Maynard Johnson
Rob Fowler wrote on 06/22/2009 09:08:34 AM: > > > Ingo Molnar wrote: > >> 3/ AMD IBS > >> > >> How is AMD IBS going to be implemented? > >> > >> IBS has two separate sets of registers. One to capture fetch > >> related data and another one to capture instruction execution > >> data. For each, the

Re: [perfmon2] IV.3 - AMD IBS

2009-06-22 Thread stephane eranian
On Mon, Jun 22, 2009 at 2:00 PM, Ingo Molnar wrote: >> 3/ AMD IBS >> >> How is AMD IBS going to be implemented? >> >> IBS has two separate sets of registers. One to capture fetch >> related data and another one to capture instruction execution >> data. For each, there is one config register but mul

Re: [perfmon2] I.2 - Grouping

2009-06-22 Thread stephane eranian
On Mon, Jun 22, 2009 at 1:50 PM, Ingo Molnar wrote: >> 2/ Grouping >> >> By design, an event can only be part of one group at a time. As I read this again, another question came up. Is the statement above also true for the group leader? >> Events in a group are guaranteed to be active on the PMU

Re: [perfmon2] IV.5 - Intel Last Branch Record (LBR)

2009-06-22 Thread stephane eranian
On Mon, Jun 22, 2009 at 2:01 PM, Ingo Molnar wrote: >> 5/ Intel Last Branch Record (LBR) >> >> Intel processors since Netburst have a cyclic buffer hosted in >> registers which can record taken branches. Each taken branch is >> stored into a pair of LBR registers (source, destination). Up >> until

Re: [perfmon2] I.2 - Grouping

2009-06-22 Thread Corey Ashford
Ingo Molnar wrote: >> 2/ Grouping >> >> By design, an event can only be part of one group at a time. >> Events in a group are guaranteed to be active on the PMU at the >> same time. That means a group cannot have more events than there >> are available counters on the PMU. Tools may want to know

Re: [perfmon2] I.2 - Grouping

2009-06-22 Thread Corey Ashford
stephane eranian wrote: > On Mon, Jun 22, 2009 at 1:50 PM, Ingo Molnar wrote: >>> 2/ Grouping >>> >>> By design, an event can only be part of one group at a time. > > As I read this again, another question came up. Is the statement > above also true for the group leader? > > >>> Events in a grou

Re: [perfmon2] I.5 - Mmaped count

2009-06-22 Thread Paul Mackerras
stephane eranian writes: > I don't see where you clear that field on x86. > Looks like it comes from hwc->idx. I suspect you need > to do something in x86_pmu_disable() to be symmetrical > with x86_pmu_enable(). > > I suspect something similar needs to be done on Power. power_pmu_disable() alrea

Re: [perfmon2] I.5 - Mmaped count

2009-06-22 Thread Paul Mackerras
Peter Zijlstra writes: > I think we would have to add that do the data page,.. something like the > below? > > Paulus? > > --- > Index: linux-2.6/include/linux/perf_counter.h > === > --- linux-2.6.orig/include/linux/perf_counter.h >

Re: [perfmon2] I.5 - Mmaped count

2009-06-22 Thread Paul Mackerras
stephane eranian writes: > Unless you tell me that pc->index is marked invalid (0) when the > event is not scheduled. I don't see how you can avoid reading > the wrong value. I am assuming that is the event is not scheduled > lock remains constant. That's what happens; when pc->index == 0, the ev

Re: [perfmon2] I.2 - Grouping

2009-06-22 Thread Paul Mackerras
Ingo Molnar writes: > > 2/ Grouping > > > > By design, an event can only be part of one group at a time. To clarify this statement of Stephane's, a _counter_ can only be in one group. You can have multiple counters counting the same _event_ and those counters can (obviously) be in different grou

Re: [perfmon2] I.5 - Mmaped count

2009-06-22 Thread Peter Zijlstra
On Tue, 2009-06-23 at 10:39 +1000, Paul Mackerras wrote: > Peter Zijlstra writes: > > > I think we would have to add that do the data page,.. something like the > > below? > > > > Paulus? > > > > --- > > Index: linux-2.6/include/linux/perf_counter.h > > ==

Re: [perfmon2] IV.3 - AMD IBS

2009-06-22 Thread Peter Zijlstra
On Mon, 2009-06-22 at 10:08 -0400, Rob Fowler wrote: > Ingo Molnar wrote: > >> 3/ AMD IBS > >> > >> How is AMD IBS going to be implemented? > >> > >> IBS has two separate sets of registers. One to capture fetch > >> related data and another one to capture instruction execution > >> data. For each,