Hello I presented a paper today at the IISWC (IEEE International Symposium on Workload Characterization) Conference that might be of interest to those on this list.
Sorry in advance about the alarmist nature of the title, we really only looked at the retired instruction counters on x86. Can Hardware Performance Counters be Trusted? by Vincent M. Weaver and Sally A. McKee http://www.csl.cornell.edu/~vince/papers/iiswc08/ A quick summary: We investigated the same-machine and cross-machine sources of variation in the retired instruction count for a wide variety of x86 machines. In theory the retired instruction count should be the same across all machines, but it isn't, sometimes by billions of instructions. We looked at 9 different x86 implementations, from a Pentium Pro up through a Core 2 system. We ran the full SPEC 2000 and 2006 benchmark suites. We found the following sources of variation: + The fldcw instruction on Pentium 4 with the instr_retired counter counts as two instructions, on all other implementations it counts as 1. With a new enough machine you can avoid this by using the instr_completed count instead. + The layout of virtual memory can cause non-deterministic counts. This is because some benchmarks do things like use pointers as hash-table keys, among other things. To work around this: * Disable heap/stack randomization on recent 2.6 kernels * Enforce 3GB compatibility layout when running 32-bit apps on 64-bit machines (otherwise the stack is moved higher to give more room) * Make sure the environment variables, command line args, and executable name are the same size on all machines being investigated (these affect stack offset) The first two of the above can be enforced using the "linux32 -3 -R" helper command, at least on debian systems. + System hardware interrupts cause extra retired instruction counts even if you are only measuring userspace code. This is often visible as being equivelent to the number of timer interrupts (approximately equal to the program runtime times HZ value) This does not affect the instr_completed count found on newer P4s but does affect all other processors we investigated. + Pagefaults can also increase the count (we did not fully investigate) ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel