Re: [Valgrind-users] binary debugging and generate source of which that running

John Reiser Fri, 08 Jul 2011 08:20:31 -0700

On 07/08/2011 05:45 AM, Josef Weidendorfer wrote:
> On Thursday 07 July 2011, John Reiser wrote:
>>> valgrind --tool=callgrind --ct-verbose=1 ...
>>
>> When I run callgrind and callgrind_annotate, then I don't understand
>> the output.  For instance, the connection between "73" and the number
>> of actual dynamic calls to exit() is mysterious to me:
>> =====
>> $ valgrind --tool=callgrind /bin/date
>> ==3790== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info
>> $ callgrind_annotate callgrind.out*
>>     Ir  file:function
>>     73  /usr/src/debug/glibc-2.13/stdlib/exit.c:exit [/lib64/libc-2.13.so]
> 
> The default for callgrind_annotate is to show self cost of the functions
> for the given event types collected.
> Thus, the "73" in the "Ir" column is the number of executed client instruction
> inside of exit. It has nothing to do with the number of calls to exit.


What I see is an instance of EPIC FAIL for Usability.  The legend paragraph
at the beginning of the output does not define "Ir", the output displayed
by the command-line parameter "--help" does not define "Ir", and the default
output of a tool whose name begins with "call" has nothing to do with
the number of calls to each subroutine.

The legend paragraph contains these instances of "Ir":
   Events recorded:  Ir
   Events shown:     Ir
   Event sort order: Ir
The output of "callgrind_annotate --help" does not contain "Ir".
The MINIMUM I expect is for BOTH places to contain an additional annotation
such as "Possible events:  Ir (Instruction Read)".

The only explanation of "Ir" is in the callgrind manual
   share/doc/valgrind/html/cl-manual.html
and the cachegrind manual
   share/doc/valgrind/html/cg-manual.html
Neither manual explains why "Ir" (Instruction Read) is used instead of
something like "Ix" (Instruction eXecuted).  Until some years ago most
x86 chips read [and decode] many more instructions than they execute:
in particular, some [effectively] non-executed instructions which reside
after every taken branch.  Some newer x86 chips read-decode-translate
instruction bytes into microcode, often blurring the boundaries between
architectural instructions, and execute only the translated microcode.
The translations are cached; a loop of up to several dozen instructions
can be executed many times after the first translation.  So the
relationship between Instruction Read and Instruction eXecute is murky.

> 
> Actually, callgrind_annotate only prints the number of calls for call arcs,
> which are displayed e.g. with "--tree=caller":
> 
>> callgrind_annotate --tree=caller callgrind.out*
> ...
>  7,060  < ???:0x0000000000401d60 (1x) [/bin/date]
>     73  *  /build/buildd/eglibc-2.13/stdlib/exit.c:exit 
> [/lib/x86_64-linux-gnu/libc-2.13.so]
> ...
> 
> This excerpt means that a function at "0x0000000000401d60" called
> "exit" one time. It probably would be good to add a total call count number
> for every function.

Yes.  Total incoming calls by function is the zero-level statistic which is
useful and expected by both novice and experienced users.

> 
> 
>> For another instance, using the exact suggestion:
>> $ valgrind --tool=callgrind --ct-verbose=1 /bin/date
>> gives output such as:
>> -----
>>    .   .   .   > check_match.10789(0x3d, 0x5b, ...) [ld-2.13.so / 
>> 0x3d25a08a20]
>>    .   .   .   .> strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0]
>>    .   .   .   . > strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0]
>> -----
>> where the indicated parameters are nonsense,
> 
> Sure, it would be a nice feature!
> But "--ct-verbose=..." is actually only for internal debugging, and
> the output is not documented. 

If something such as "strcmp(0x3d, 0x5b, ...)" is _not_ an indication
of parameters, then this is another poor choice, especially as a
recommendation to a new user.

The values show the stack content
> (interpreted as unsigned ints, as far as I remember) when entering the
> function - which was enough at the time I needed the debug output.

On x86_64, the first 6 [integer and pointer] parameters are passed in registers,
and not on the stack.

> 
>> and a function whose body
>> is a loop displays as a recursion.
> 
> Please tell me a way how to distinguish loops from tail recursion optimization
> at machine code level, when the loop jumps back to the first instruction
> of a function.

After entry to a function and before the corresponding return, if there has
been no write to the register or memory location which is designated to hold
the return address, and if the target is within the same function (as determined
by the available symbol table and perhaps by implementation details such as
choice of opcode and instruction encoding), then it's a loop and not
a recursion.

Sure, at the _lowest_ level a branch back to the beginning [or even after the
prologue] cannot be distinguished between a loop and a recursion.  However,
programmers who use explicit tail recursion to the same function _know_
that it is equivalent to a loop, and won't be surprised by such a display;
whereas the coder of a loop might be confused by seeing a recursion.  Also,
callgrind should take advantage of having an event horizon that is larger
than one instruction.

Adopting the "subroutine outlook" also favors a loop over recursion.
The "subroutine outlook" knows only what is [scheduled] to be done
in the future; what has happened in the past is not relevant and
cannot be known.  Thus the current recursion level always equals
the number of pending return addresses.  The disadvantage is that
traceback "through" a tail recursion elides some names, in much the same
way that visible state at a debugger breakpoint shows only the _next_
instruction to be executed, and not the previous instruction pointer.

> 
> Often quite some meta information is thrown away, and there
> is no way to exactly reconstruct what has happened at the source level.
> So callgrind must use heuristics which sometimes can go wrong. For the
> high-optimized code of strcmp above, this heuristic obviously goes wrong.
> But without source code, your claim of existance of a loop in strcmp
> could actually be wrong. It could have been coded as recursive function,
> resulting in the same machine code.
> 
>> It is hard for me to trust such output.
> 
> Please tell me the reason why you started with the assumption that the
> discovered output must be buggy.

I started with the assumption that a tool with a name such as "callgrind"
would tell me how many times each subroutine was called.  The output was
not so.  The output legend had no explanation.  The --help output had
no explanation.  The output used notation "strcmp(0x3d, 0x5b, ...)"
for something other than displaying [correctly] a function call with parameters.
After all that experience, then I decided that I should be wary of callgrind.

-- 

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] binary debugging and generate source of which that running

Reply via email to