I tried the below (the toggle version, translated to c) on x86, with gcc 4.9.2.
Looking at the printf call, I see: 0x080483e2 <+178>: push $0x7a314 0x080483e7 <+183>: push $0x80485d0 0x080483ec <+188>: call 0x80482f0 <printf@plt> So, you see that 0x7a314 (so 500500 in decimal) is pushed. In other words, gcc optimiser has detected what you are doing and has computed the result at compilation time :). The loop is then only executed to do the CALLGRIND start/stop. This is somewhat confirmed by gdb: (gdb) list 10,10 10 n += i; (gdb) info line 10 Line 10 of "toggle.c" is at address 0x8048391 <main+97> but contains no code. (gdb) I guess then that the difference between toggle and start/stop version depends on the very fine details of what/when exactly the start/stop and/or the toggle client requests are taking effect. E.g. I think that stop instrumentation implies to flush the translation cache. But the currently executed block will not be really flushed till the block is exited. I think that at least START/STOP is not designed/not usable to measure ultra small pieces of code, as the code is only re-instrumented when 'exiting from the current executing block'. (for more info, see the long comment in coregrind/m_transtab.c around line 224). Toggle collect on the other hand might work better as this does not have to flush the translation cache : the JITted code is instrumented, the callgrind 'helpers' will be called by the JITted code but will just do nothing if collect is off. Then why is the nr of instruction 'big' ? Well, as you have seen, the addition itself is 0 instruction :). So, what is measured is not very clear, maybe/probably measuring a part of the TOGGLE call. At least, when using --dump-instr=yes and then using kcachegrind, the executed instructions are associated with the TOGGLE macro. In summary: do not use STOP/START for very small pieces of code such as loop bodies, very unlikely to work precisely/correctly. TOGGLE collect for very small pieces of code in a loop are likely to measure (partially) the TOGGLE call (in particular if gcc optimiser has just removed the 'real' loop code). Philippe On Thu, 2015-08-27 at 06:42 +0000, galexand...@nc.rr.com wrote: > I would like to count instructions for a specific part of my code. > > I found a valgrind-users thread, "A great trick for counting cycles using > callgrind - but what about massif?" > (http://sourceforge.net/p/valgrind/mailman/message/33476105/), that gives a > way to use Callgrind to do this using the CALLGRIND_START_INSTRUMENTATION and > CALLGRIND_STOP_INSTRUMENTATION macros. This seemed to work but was taking > way too long on "real world tests" to be of practical use. > > So I did some more searching and found a Stack Overflow question, "Callgrind: > Profile a specific part of my code" > (http://stackoverflow.com/questions/13688185/callgrind-profile-a-specific-part-of-my-code), > that addresses the performance issue. It suggests using the > CALLGRIND_TOGGLE_COLLECT macro instead the CALLGRIND_START_INSTRUMENTATION / > CALLGRIND_STOP_INSTRUMENTATION macro. This worked in that performance was > much better. However, the two approaches give significantly different > instruction counts, which leads me to wonder if either is reporting a > "correct" instruction count. > > I created two simple test programs illustrating the difference. Here's a > simple program using the CALLGRIND_START_INSTRUMENTATION / > CALLGRIND_STOP_INSTRUMENTATION approach: > > #include <cstdlib> > #include <iostream> > #include <string> > #include <valgrind/callgrind.h> > > int main(int argc, char* argv[]) { > int n = 0; > for (int i = 1; i <= 1000; ++i) { > CALLGRIND_START_INSTRUMENTATION; > n += i; > CALLGRIND_STOP_INSTRUMENTATION; > } > std::cout << "n: " << n << std::endl; > CALLGRIND_DUMP_STATS; > exit(0); > } > > I compiled this using g++ 4.8.2 on openSUSE 13.2 x86-64 as follows: > > g++ -O3 -o Test1 Test1.cpp > > I then ran Callgrind as follows: > > valgrind --tool=callgrind --instr-atstart=no Test1 > > which gave > > ==7371== Callgrind, a call-graph generating cache profiler > ==7371== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et > al. > ==7371== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info > ==7371== Command: Test1 > ==7371== > ==7371== For interactive control, run 'callgrind_control -h'. > n: 500500 > ==7371== > ==7371== Events : Ir > ==7371== Collected : 0 > ==7371== > ==7371== I refs: 0 > > The question is why is the instruction count 0? (I do get a nonzero > instruction count when profiling my real code using the > CALLGRIND_START_INSTRUMENTATION / CALLGRIND_STOP_INSTRUMENTATION approach, > but the reported instruction count is significantly less than the instruction > count reported using the CALLGRIND_TOGGLE_COLLECT approach.) > > Here's the program using the CALLGRIND_TOGGLE_COLLECT approach: > > #include <cstdlib> > #include <iostream> > #include <string> > #include <valgrind/callgrind.h> > > int main(int argc, char* argv[]) { > int n = 0; > for (int i = 1; i <= 10000; ++i) { > CALLGRIND_TOGGLE_COLLECT; > n += i; > CALLGRIND_TOGGLE_COLLECT; > } > std::cout << "n: " << n << std::endl; > CALLGRIND_DUMP_STATS; > exit(0); > } > > I compiled the program as follows: > > g++ -O3 -o Test2 Test2.cpp > > and ran Callgrind as follows: > > valgrind --tool=callgrind --collect-atstart=no Test2 > > which gave > > ==7378== Callgrind, a call-graph generating cache profiler > ==7378== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et > al. > ==7378== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info > ==7378== Command: Test2 > ==7378== > ==7378== For interactive control, run 'callgrind_control -h'. > n: 50005000 > ==7378== > ==7378== Events : Ir > ==7378== Collected : 90008 > ==7378== > ==7378== I refs: 90,008 > > The reported instruction count is 90008, which amounts to an amortized > instruction count of approximately nine instructions for each "n += i;". > Nine instructions seems high for a single integer addition even on an x86-64 > CPU. Is the reported instruction count really a "correct" measure of the "n > += i;" statement? > > Is there a better way to profile instruction count for a specific part of > code? > > Thanks, > Geoff Alexander > > ------------------------------------------------------------------------------ > _______________________________________________ > Valgrind-users mailing list > Valgrind-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/valgrind-users ------------------------------------------------------------------------------ _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users