On 25/08/2011 10:19 AM, stephane eranian wrote: > The current support for mmaped count is broken on perf_event x86. > It simply does not work. I think it only works on PPC at this point. So the issue runs deeper than just user-level rdpmc? A quick test showed that things seem to work right if I create a kernel module whose init function issues the following:
smp_call_function((smp_call_func_t) set_in_cr4, (void*) X86_CR4_PCE, 1) The event counter index set by the kernel seems to point to a relevant counter... I haven't tried with multiplexing or anything complex, though. > furthermore, the default security level disallows rdpmc at the user level. > The kernel would have to be changed (CR4). The quick-n-dirty way is easy to arrange on my sandboxed experimental machine (see above), and it seems like perfctr had a workable/secure implementation of The Right Way to do things. Is there something about perf events which is incompatible with the perfctr approach? The only security hole I can see is if some cpu-level counters are also active and the self-monitoring process doesn't have rights to them. That case should be easy to identify, though, and fixable by either running as root or by relaxing the paranoia settings in /proc. > given that you are interested in profiling, there is no point in trying to use > mmapped counter. the kernel has a kernel-level buffer, so the cost of > exporting > samples is amortized over a large number of samples. I'm *not* trying to do sample-based profiling. Oprofile works just fine for that. I need to count how many cache misses a given function call triggers when called with different types of inputs. The variable(s) to aggregate on will depend on the observed trends. This would be pretty easy to set up with mmap+rdpmc, but making a syscall at every function of interest introduces a huge confound. I'd try a microbenchmark, but those are *really* hard to get right when you're looking at cache misses (can't just repeat the same inputs over and over). I guess there's cachegrind, but 100x slowdown while processing multi-GB datasets is kind of unappealing (and I'm not sure if cachegrind even has an API to "read" its "counters"). Regards, Ryan ------------------------------------------------------------------------------ EMC VNX: the world's simplest storage, starting under $10K The only unified storage solution that offers unified management Up to 160% more powerful than alternatives and 25% more efficient. Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel