On 07/08/2011 05:45 AM, Josef Weidendorfer wrote: > On Thursday 07 July 2011, John Reiser wrote: >>> valgrind --tool=callgrind --ct-verbose=1 ... >> >> When I run callgrind and callgrind_annotate, then I don't understand >> the output. For instance, the connection between "73" and the number >> of actual dynamic calls to exit() is mysterious to me: >> ===== >> $ valgrind --tool=callgrind /bin/date >> ==3790== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info >> $ callgrind_annotate callgrind.out* >> Ir file:function >> 73 /usr/src/debug/glibc-2.13/stdlib/exit.c:exit [/lib64/libc-2.13.so] > > The default for callgrind_annotate is to show self cost of the functions > for the given event types collected. > Thus, the "73" in the "Ir" column is the number of executed client instruction > inside of exit. It has nothing to do with the number of calls to exit.
What I see is an instance of EPIC FAIL for Usability. The legend paragraph at the beginning of the output does not define "Ir", the output displayed by the command-line parameter "--help" does not define "Ir", and the default output of a tool whose name begins with "call" has nothing to do with the number of calls to each subroutine. The legend paragraph contains these instances of "Ir": Events recorded: Ir Events shown: Ir Event sort order: Ir The output of "callgrind_annotate --help" does not contain "Ir". The MINIMUM I expect is for BOTH places to contain an additional annotation such as "Possible events: Ir (Instruction Read)". The only explanation of "Ir" is in the callgrind manual share/doc/valgrind/html/cl-manual.html and the cachegrind manual share/doc/valgrind/html/cg-manual.html Neither manual explains why "Ir" (Instruction Read) is used instead of something like "Ix" (Instruction eXecuted). Until some years ago most x86 chips read [and decode] many more instructions than they execute: in particular, some [effectively] non-executed instructions which reside after every taken branch. Some newer x86 chips read-decode-translate instruction bytes into microcode, often blurring the boundaries between architectural instructions, and execute only the translated microcode. The translations are cached; a loop of up to several dozen instructions can be executed many times after the first translation. So the relationship between Instruction Read and Instruction eXecute is murky. > > Actually, callgrind_annotate only prints the number of calls for call arcs, > which are displayed e.g. with "--tree=caller": > >> callgrind_annotate --tree=caller callgrind.out* > ... > 7,060 < ???:0x0000000000401d60 (1x) [/bin/date] > 73 * /build/buildd/eglibc-2.13/stdlib/exit.c:exit > [/lib/x86_64-linux-gnu/libc-2.13.so] > ... > > This excerpt means that a function at "0x0000000000401d60" called > "exit" one time. It probably would be good to add a total call count number > for every function. Yes. Total incoming calls by function is the zero-level statistic which is useful and expected by both novice and experienced users. > > >> For another instance, using the exact suggestion: >> $ valgrind --tool=callgrind --ct-verbose=1 /bin/date >> gives output such as: >> ----- >> . . . > check_match.10789(0x3d, 0x5b, ...) [ld-2.13.so / >> 0x3d25a08a20] >> . . . .> strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0] >> . . . . > strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0] >> ----- >> where the indicated parameters are nonsense, > > Sure, it would be a nice feature! > But "--ct-verbose=..." is actually only for internal debugging, and > the output is not documented. If something such as "strcmp(0x3d, 0x5b, ...)" is _not_ an indication of parameters, then this is another poor choice, especially as a recommendation to a new user. The values show the stack content > (interpreted as unsigned ints, as far as I remember) when entering the > function - which was enough at the time I needed the debug output. On x86_64, the first 6 [integer and pointer] parameters are passed in registers, and not on the stack. > >> and a function whose body >> is a loop displays as a recursion. > > Please tell me a way how to distinguish loops from tail recursion optimization > at machine code level, when the loop jumps back to the first instruction > of a function. After entry to a function and before the corresponding return, if there has been no write to the register or memory location which is designated to hold the return address, and if the target is within the same function (as determined by the available symbol table and perhaps by implementation details such as choice of opcode and instruction encoding), then it's a loop and not a recursion. Sure, at the _lowest_ level a branch back to the beginning [or even after the prologue] cannot be distinguished between a loop and a recursion. However, programmers who use explicit tail recursion to the same function _know_ that it is equivalent to a loop, and won't be surprised by such a display; whereas the coder of a loop might be confused by seeing a recursion. Also, callgrind should take advantage of having an event horizon that is larger than one instruction. Adopting the "subroutine outlook" also favors a loop over recursion. The "subroutine outlook" knows only what is [scheduled] to be done in the future; what has happened in the past is not relevant and cannot be known. Thus the current recursion level always equals the number of pending return addresses. The disadvantage is that traceback "through" a tail recursion elides some names, in much the same way that visible state at a debugger breakpoint shows only the _next_ instruction to be executed, and not the previous instruction pointer. > > Often quite some meta information is thrown away, and there > is no way to exactly reconstruct what has happened at the source level. > So callgrind must use heuristics which sometimes can go wrong. For the > high-optimized code of strcmp above, this heuristic obviously goes wrong. > But without source code, your claim of existance of a loop in strcmp > could actually be wrong. It could have been coded as recursive function, > resulting in the same machine code. > >> It is hard for me to trust such output. > > Please tell me the reason why you started with the assumption that the > discovered output must be buggy. I started with the assumption that a tool with a name such as "callgrind" would tell me how many times each subroutine was called. The output was not so. The output legend had no explanation. The --help output had no explanation. The output used notation "strcmp(0x3d, 0x5b, ...)" for something other than displaying [correctly] a function call with parameters. After all that experience, then I decided that I should be wary of callgrind. -- ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users