On 25/08/2011 10:19 AM, stephane eranian wrote:
> The current support for mmaped count is broken on perf_event x86.
> It simply does not work. I think it only works on PPC at this point.
So the issue runs deeper than just user-level rdpmc? A quick test showed 
that things seem to work right if I create a kernel module whose init 
function issues the following:

smp_call_function((smp_call_func_t) set_in_cr4, (void*) X86_CR4_PCE, 1)

The event counter index set by the kernel seems to point to a relevant 
counter... I haven't tried with multiplexing or anything complex, though.

> furthermore, the default security level disallows rdpmc at the user level.
> The kernel would have to be changed (CR4).
The quick-n-dirty way is easy to arrange on my sandboxed experimental 
machine (see above), and it seems like perfctr had a workable/secure 
implementation of The Right Way to do things. Is there something about 
perf events which is incompatible with the perfctr approach? The only 
security hole I can see is if some cpu-level counters are also active 
and the self-monitoring process doesn't have rights to them. That case 
should be easy to identify, though, and fixable by either running as 
root or by relaxing the paranoia settings in /proc.

> given that you are interested in profiling, there is no point in trying to use
> mmapped counter. the kernel has a kernel-level buffer, so the cost of 
> exporting
> samples is amortized over a large number of samples.
I'm *not* trying to do sample-based profiling. Oprofile works just fine 
for that. I need to count how many cache misses a given function call 
triggers when called with different types of inputs. The variable(s) to 
aggregate on will depend on the observed trends. This would be pretty 
easy to set up with mmap+rdpmc, but making a syscall at every function 
of interest introduces a huge confound. I'd try a microbenchmark, but 
those are *really* hard to get right when you're looking at cache misses 
(can't just repeat the same inputs over and over).

I guess there's cachegrind, but 100x slowdown while processing multi-GB 
datasets is kind of unappealing (and I'm not sure if cachegrind even has 
an API to "read" its "counters").

Regards,
Ryan


------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management 
Up to 160% more powerful than alternatives and 25% more efficient. 
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to