Re: [Valgrind-users] Usability of callgrind_annotate

John Reiser Sun, 10 Jul 2011 22:38:19 -0700

On 07/08/2011 10:54 AM, Josef Weidendorfer wrote:
> On Friday 08 July 2011, John Reiser wrote:
>> <<snip>>


> The first thing I must note here is that "callgrind_annotate" is meant
> as fallback solution for people who can/do not want to use a GUI. 
> KCachegrind (the GUI for callgrind output)
> - shows "Instruction Fetch" as explanation for "Ir",
> - shows a call count column in the function list.

A newbie can discover KCachegrind only from a detailed reading of the
manual.  Please mention KCachegrind in the header output from callgrind,
and in the header output from callgrind_annotate, and on the manual pages.
Also mention that KCachegrind is part of the kdesdk package;
this is _very_ non-obvious!  For example:
    For interactive control, run 'callgrind_control -h'.
    For GUI report and visualization, see KCachegrind, which is
    part of the kdesdk package.

Installing kdesdk on my system (Fedora 14 with Gnome and no previous KDE)
requires 156MB of new file space, which seems like a lot: more than 1/2
of valgrind itself plus all tools.


<<snip>>
>> Sure, at the _lowest_ level a branch back to the beginning [or even after the
>> prologue] cannot be distinguished between a loop and a recursion.
> 
> Right. Unfortunately, callgrind only can see the lowest level.

Really?  It seems to me that any valgrind tool can look at the whole
program before execution starts, and also can see every dynamically-
loaded library during dlopen() before the execution of any instruction
from that library.  Thus callgrind could detect in advance all leaf
subroutines, and all [over-]writes of any return address.

>  Do you know any debug information that can help here?

The DWARF-3 info certainly pinpoints every write to the return-address
pseudo-register, and every adjustment to the stack pointer.  If both
are empty, then it is safe to consider every jump to be a loop
rather than a tail recursion to the same routine.  [More cases apply,
too.]


<<snip>>
>> callgrind should take advantage of having an event horizon that is larger
>> than one instruction.
> 
> Please explain.

A tool can detect first entry to a subroutine.  Analyze the static properties
of the whole subroutine, then cache the results; invalidate for dlclose()
and self-modifying code.  .text can be analyzed at several megabytes
per second, so the "extra work" is small.

> 
>> Adopting the "subroutine outlook" also favors a loop over recursion.
>> The "subroutine outlook" knows only what is [scheduled] to be done
>> in the future; what has happened in the past is not relevant and
>> cannot be known.  Thus the current recursion level always equals
>> the number of pending return addresses.  The disadvantage is that
>> traceback "through" a tail recursion elides some names, in much the same
>> way that visible state at a debugger breakpoint shows only the _next_
>> instruction to be executed, and not the previous instruction pointer.
> 
> Sorry, I don't get your point here.

The point is: if the number of pending return addresses does not change
then the recursion depth does not change, so don't change the indentation
when printing the nesting.  In particular, if consecutive lines of the
nesting display would designate the same subroutine with the same number
of return addresses pending on the stack, then it's a loop and not
a [full] recursion.

> Do you suggest for callgrind to better always play dumb, and do not even
> try to reconstruct a potential tail recursion into a real recursion?

Forget the details of any call which has returned.  Report only what
can be deduced from the current $pc and the stack.

Consider the source-code call chain  A ==> B ==> C  where  B ==> C  is
a tail recursion which the compiler recognizes and implements.  During
execution of C, then the actual dynamic _return_ chain is  C ==> A.
B has been erased by the tail recursion.  The stack contains no record
of B ever having been invoked.  So the effective dynamic call chain
[backtrace] at this point is  A ==> C.  This may surprise the programmer
who never wrote any call from A to C, but this is much the same as
stopping at a breakpoint immediately after a taken branch.  You know
where you are [$pc], and where you're going (the backtrace of pending
_return_s), but not every detail of how you got here (taken branches
or tail recursions are not known.)

> 
> There is a similar scenario: machine code can have jumps between
> functions (e.g. as result of a tail recursion optimization, or hand crafted
> assembler). Callgrind output is about call relationship. There is no way
> to reason about jumps between functions. Thus, callgrind has to map a
> jump between functions either to a "call" (with an additional return when
> returning from the function jumped to) or a "return/call" pair.

It is better to think of this as a _continuation_, for which
some programmers might use the idea of "return+call" as a crutch
until they become familiar with continuations.  In many cases
it is important to realize that a principle side effect during
execution is the dynamic writing of the program which constitutes
the continuation from the current subroutine.  Of course, this
"program of continuations" is the stack.

-- 

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] Usability of callgrind_annotate

Reply via email to