I would like to count instructions for a specific part of my code. I found a valgrind-users thread, "A great trick for counting cycles using callgrind - but what about massif?" (http://sourceforge.net/p/valgrind/mailman/message/33476105/), that gives a way to use Callgrind to do this using the CALLGRIND_START_INSTRUMENTATION and CALLGRIND_STOP_INSTRUMENTATION macros. This seemed to work but was taking way too long on "real world tests" to be of practical use.
So I did some more searching and found a Stack Overflow question, "Callgrind: Profile a specific part of my code" (http://stackoverflow.com/questions/13688185/callgrind-profile-a-specific-part-of-my-code), that addresses the performance issue. It suggests using the CALLGRIND_TOGGLE_COLLECT macro instead the CALLGRIND_START_INSTRUMENTATION / CALLGRIND_STOP_INSTRUMENTATION macro. This worked in that performance was much better. However, the two approaches give significantly different instruction counts, which leads me to wonder if either is reporting a "correct" instruction count. I created two simple test programs illustrating the difference. Here's a simple program using the CALLGRIND_START_INSTRUMENTATION / CALLGRIND_STOP_INSTRUMENTATION approach: #include <cstdlib> #include <iostream> #include <string> #include <valgrind/callgrind.h> int main(int argc, char* argv[]) { int n = 0; for (int i = 1; i <= 1000; ++i) { CALLGRIND_START_INSTRUMENTATION; n += i; CALLGRIND_STOP_INSTRUMENTATION; } std::cout << "n: " << n << std::endl; CALLGRIND_DUMP_STATS; exit(0); } I compiled this using g++ 4.8.2 on openSUSE 13.2 x86-64 as follows: g++ -O3 -o Test1 Test1.cpp I then ran Callgrind as follows: valgrind --tool=callgrind --instr-atstart=no Test1 which gave ==7371== Callgrind, a call-graph generating cache profiler ==7371== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al. ==7371== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info ==7371== Command: Test1 ==7371== ==7371== For interactive control, run 'callgrind_control -h'. n: 500500 ==7371== ==7371== Events : Ir ==7371== Collected : 0 ==7371== ==7371== I refs: 0 The question is why is the instruction count 0? (I do get a nonzero instruction count when profiling my real code using the CALLGRIND_START_INSTRUMENTATION / CALLGRIND_STOP_INSTRUMENTATION approach, but the reported instruction count is significantly less than the instruction count reported using the CALLGRIND_TOGGLE_COLLECT approach.) Here's the program using the CALLGRIND_TOGGLE_COLLECT approach: #include <cstdlib> #include <iostream> #include <string> #include <valgrind/callgrind.h> int main(int argc, char* argv[]) { int n = 0; for (int i = 1; i <= 10000; ++i) { CALLGRIND_TOGGLE_COLLECT; n += i; CALLGRIND_TOGGLE_COLLECT; } std::cout << "n: " << n << std::endl; CALLGRIND_DUMP_STATS; exit(0); } I compiled the program as follows: g++ -O3 -o Test2 Test2.cpp and ran Callgrind as follows: valgrind --tool=callgrind --collect-atstart=no Test2 which gave ==7378== Callgrind, a call-graph generating cache profiler ==7378== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al. ==7378== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info ==7378== Command: Test2 ==7378== ==7378== For interactive control, run 'callgrind_control -h'. n: 50005000 ==7378== ==7378== Events : Ir ==7378== Collected : 90008 ==7378== ==7378== I refs: 90,008 The reported instruction count is 90008, which amounts to an amortized instruction count of approximately nine instructions for each "n += i;". Nine instructions seems high for a single integer addition even on an x86-64 CPU. Is the reported instruction count really a "correct" measure of the "n += i;" statement? Is there a better way to profile instruction count for a specific part of code? Thanks, Geoff Alexander ------------------------------------------------------------------------------ _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users