On Thu, Mar 18, 2010 at 10:11 AM, Vince Weaver <vweav...@eecs.utk.edu>wrote:
>
> > I wrote a simple Fibonacci and counted the #of instructions
> (inst_retired)
> > using both pin and performance counter.
> > As you can see, it seems like perf_counter undercount the #of
> instructions
> > and the result is non-deterministic (sometimes 94730 but sometimes 94729)
> > Any reason for this?
>
> Did you dynamically link your code? The dynamic linker under Linux does a
> lot of stuff which might not be deterministic.
>
> I compiled your code on an AMD Phenom machine.
>
> Dynamically linked, compiled with gcc 4.4 and -O2 it varied from
> 86558 to 86561.
>
> But if I statically linked, that is compile with "-static" I got
> a very consistent "7749" instructions each time.
>
>
I performed the same test. but the static show didn't show any more
consistency over dynamic linking.
fib(10), static link
$ ./task -e "inst_retired" ./fib
7389 inst_retired
...
7390 inst_retired <-- 5th try
fib(30), static link
36157851 inst_retired
36157851 inst_retired
36157851 inst_retired
36157850 inst_retired
36157855 inst_retired
36157849 inst_retired
> If you happen to have valgrind installed, version 3.5 or later, you can
> also count instructions using something like:
> valgrind --tool=exp-bbv --instr-count-only=yes -- ./fib
> although on your fib code the results are too high by a factor of two...
> weird. I need to find out why that happens.
>
>
I got the same result. valgrind overcount about 1700 instructions more
compared to pin or performance counter. The following is for fib(10) with
static link.
==24764== exp-bbv, a SimPoint basic block vector generator
==24764== NOTE: This is an Experimental-Class Valgrind Tool
==24764== Copyright (C) 2006-2009 Vince Weaver
==24764== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for
copyright info
==24764== Command: ./fib
==24764== # Thread 1
==24764== # Total intervals: 0 (Interval Size 100000000)
==24764== # Total instructions: *9085*
==24764== # Total reps: 25
==24764== # Unique reps: 9
==24764== # Total fldcw instructions: 1
> One thing to watch out for when using pin... newer versions of Pin (my
> notes say pinkit more recent than 29972) change the way that rep-prefixed
> string instructions are counted. Before that Pin matches what hardware
> does (each rep-prefixed instruction counts as "1") wheras current Pin
> counts each rep as a separate instruction. It is quite possible the
> overcount you see with Pin is due to this; the valgrind tool I mention
> above will tell you how many rep-prefixed instructions were executed in
> the code.
>
>
I used lasted version. /pin-2.6-27887-gcc.4.0.0-ia32_intel64-linux/
> One note on your test code... you should probably use the result of the
> "fib" calculation, as a printf or such. Otherwise the C compiler could
> optimize out the whole routine, as the result is unused.
>
> > main()
> > {
> > fib(10)
> > }
> >
> >
> >
> >
>
I didn't want to use printf because it may cause non-determinism since it
call write system call in the end although we disable kernel mode counting.
instead, now I use the following
int main()
{
return fib(10);
}
Best
Heechul
> Vince
> vweav...@eecs.utk.edu
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel