On Wed, Nov 08, 2006 at 11:22:15AM +0100, Philip J. Mucci wrote: > Hi folks, > > For what it's worth, there are folks here at BSC in Barcelona who also > are in need of a KAPI for doing adaptive scheduling. >
Yeah, this is definitively a good example for kernel level access to counters. But I believe the setup/teardown can be done in user mode. > > > On Wed, 2006-11-08 at 01:52 -0800, Stephane Eranian wrote: > > Will, > > > > On Mon, Nov 06, 2006 at 03:30:50PM -0500, William Cohen wrote: > > > > > > > >>At the very least there needs to be a mechanism to read the values of > > > >>the > > > >>performance monitoring hardware registers in kernel-space. Certainly > > > >>people have used get_cycles() to see how long certain things take to do > > > >>within the kernel. Having access to the performance monitoring counters > > > >>would allow better testing of some hypothesis, e.g. were there fewer or > > > >>more cache misses with this approach versus another approach. It isn't > > > >>practical to do the read of the performance counter in user-space. Too > > > >>bad that the performance hardware designers for most processors took > > > >>short cuts, so that a simple direct reading of the perfmon hardware > > > >>data > > > >>counters won't work. > > > > > > > > > > > >ou can read any raw performance counters in kernel space using the > > > >appropriate Yassembly instruction. On x86 that would be rdmsr/rdpmc. Of > > > >course, that would > > > >not give you the full 64-bit (software virtualized) value. But I suspect > > > >that in-kernel you are after micro-mesasurements that are unlikely to > > > >run > > > >long > > > >enough to overflow a 32-bit counter (especially if not measuing cycles). > > > > > > > >I think you are after a small subset of the calls from perfmon2, namely > > > >start/stop, read counters. I think the setup/tear-down could be done at > > > >the > > > >user level, i.e., you'd have to assume there is a session going. If we > > > >further assume system-wide ONLY and that you can only operate on the cpu > > > >where > > > >you issue the call, then it would not be too difficult to add the 3 > > > >calls > > > >you need. > > > > > > > > > > Hi Stephane, > > > > > > I have been thinking some more about using the counter in the kernel. The > > > rdmsr/rdpmc certainly give access to the performance monitoring > > > registers. > > > Having counters setup to be system-wide only before the module is loaded > > > would be sufficient. > > > > Yes, I envision that this would only make sense in a system-wide type of > > measurement. Then on each CPU, the kernel couldhave a collector thread > > readings the counters. > > > > > > > > How is the user space going to communicate to the kernel modules which > > > registers hold which values. Libpfm could put events in different > > > counter > > > than the module expects, e.g. watchdog timer off or on where register 0 > > > may > > > or may not be used or p4 machine booted in HT and not HT mode. > > > > Yes, for that you would have to invent to dedicated interface maybe through > > a device driver. The driver would record in the kernel globals, which > > counters > > to read from for what event. > > > > Note that we could also provide a simplified pfm_read_pmds() for kernel > > callers. You > > can get to the perfmon context attached to each CPU by reading the per-CPU > > variable > > pmu_ctx. To make sense of the counters, you need to know that PMD4 measures > > CPU_CYCLES, > > i.e., event -> counter assignment no matter what because, as you point out, > > there > > can be more than one assignment possible. > > > > _______________________________________________ > perfmon mailing list > [email protected] > http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/ -- -Stephane _______________________________________________ perfmon mailing list [email protected] http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
