Hello

I presented a paper today at the IISWC (IEEE International Symposium
on Workload Characterization) Conference that might be of interest to 
those on this list.

Sorry in advance about the alarmist nature of the title, we really only 
looked at the retired instruction counters on x86.

Can Hardware Performance Counters be Trusted?
    by Vincent M. Weaver and Sally A. McKee

    http://www.csl.cornell.edu/~vince/papers/iiswc08/


A quick summary:

   We investigated the same-machine and cross-machine sources of variation
in the retired instruction count for a wide variety of x86 machines.  In 
theory the retired instruction count should be the same across all 
machines, but it isn't, sometimes by billions of instructions.

   We looked at 9 different x86 implementations, from a Pentium Pro up 
through a Core 2 system.  We ran the full SPEC 2000 and 2006 benchmark 
suites.

   We found the following sources of variation:

    + The fldcw instruction on Pentium 4 with the instr_retired counter
      counts as two instructions, on all other implementations it counts as
      1.  With a new enough machine you can avoid this by using the
      instr_completed count instead.

    + The layout of virtual memory can cause non-deterministic counts.
      This is because some benchmarks do things like use pointers as
      hash-table keys, among other things.

      To work around this:
       * Disable heap/stack randomization on recent 2.6 kernels
       * Enforce 3GB compatibility layout when running 32-bit apps on
         64-bit machines (otherwise the stack is moved higher to give
         more room)
       * Make sure the environment variables, command line args, and
         executable name are the same size on all machines being
         investigated (these affect stack offset)

         The first two of the above can be enforced using the
         "linux32 -3 -R" helper command, at least on debian systems.

    + System hardware interrupts cause extra retired instruction counts
      even if you are only measuring userspace code.  This is often
      visible as being equivelent to the number of timer interrupts
      (approximately equal to the program runtime times HZ value)

      This does not affect the instr_completed count found on newer P4s
      but does affect all other processors we investigated.

    + Pagefaults can also increase the count (we did not fully investigate)


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to