Re: [Valgrind-users] RFC: proposal to remove user annotation from `cg_annotate`

2023-04-03 Thread Nicholas Nethercote
There were no objections, and I have now removed user annotations from
`cg_annotate`.

Nick

On Wed, 29 Mar 2023 at 09:03, Nicholas Nethercote 
wrote:

> Hi,
>
> I recently rewrote `cg_annotate`, `cg_diff`, and `cg_merge` in Python. The
> old versions were written in Perl, Perl, and C, respectively. The new
> versions are much nicer and easier to modify, and I have various ideas for
> improving `cg_annotate`. This email is about one of those ideas.
>
> A typical way to invoke `cg_annotate` is like this:
>
> > cg_annotate cachegrind.out.12345
>
> This implies `--auto=yes`, which requests line-by-line "auto-annotation"
> of source files. I.e. `cg_annotate` will automatically annotate all files
> in the profile that meet the significance threshold.
>
> It's also possible to do something like this:
>
> > cg_annotate --auto=no cachegrind.out.12345 a.c b.c
>
> Which instead requests "user annotation" of the files `a.c` and `b.c`.
>
> My thesis is that auto-annotation suffices in practice for all reasonable
> use cases, and that user annotation is unnecessary and can be removed.
>
> When I first wrote `cg_annotate` in 2002, only user annotation was
> implemented. Shortly after, I added the `--auto={yes,no}` option. Since
> then I've never used user annotation, and I suspect nobody else has either.
> User annotation is ok when dealing with tiny programs, but as soon as you
> are profiling a program with more than a handful of source files it becomes
> impractical.
>
> The only possible use cases I can think of for user annotation are as
> follows.
>
>- If you want to see a particular file(s) annotated but you don't want
>to see any others, then you can use user annotation in combination with
>`--auto=no`. But it's trivial to search through the output for the
>particular file, so this doesn't seem important.
>- If the path to a file is somehow really messed up in the debug info,
>it might be possible that auto-annotation would fail to find it, but user
>annotation could find it, possibly in combination with `-I`. But this seems
>unlikely. Some basic testing shows that gcc, clang and rustc all default to
>using full paths in debug info. gcc supports `-fdebug-prefix-map` but that
>seems to mostly be used for changing full paths to relative paths, which
>will still work fine.
>
> Removing user annotation would (a) simplify the code and docs, and (b)
> enable the possibility of moving the merge functionality from `cg_merge`
> into `cg_annotate`, by allowing the user to specify multiple cachegrind.out
> files as input.
>
> So: is anybody using user annotation? Does anybody see any problems with
> this proposal?
>
> Thanks.
>
> Nick
>
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] RFC: changing Cachegrind default to `--cache-sim=no`

2023-04-03 Thread Nicholas Nethercote
On Mon, 3 Apr 2023 at 21:36, David Faure  wrote:

>
> But then, what's the difference between `cachegrind --cache-sim=no`
> and `callgrind`?
>
> https://accu.org/journals/overload/20/111/floyd_1886/ says
> "The main differences are that Callgrind has more information about the
> callstack whilst cachegrind gives more information about cache hit rates."
>
> Wouldn't one want callstacks? (if this means stack traces).
> I know I must be missing something, thanks for enlightening me.
>

Callgrind is a forked and extended version of Cachegrind. It also simulates
a cache, with a slightly different simulation to Cachegrind's. The fact
that both tools exist is due to historical reasons; if starting from
scratch today you wouldn't deliberately split them.

Call stacks are often useful (I regularly use Callgrind as well as
Cachegrind) but they aren't always necessary. Without them, Cachegrind runs
faster than Callgrind and produces smaller data files. Cachegrind also
supports diffing and merging different files, while Callgrind does not.

Nick
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] RFC: changing Cachegrind default to `--cache-sim=no`

2023-04-03 Thread David Faure
[removing valgrind-developers, since I guess I can't post there]

On lundi 3 avril 2023 11:29:25 CEST Nicholas Nethercote wrote:
> I have been using `--cache-sim=no` almost exclusively for a long time. The
> cache simulation done by Valgrind is an approximation of the memory
> hierarchy of a 2002 AMD Athlon processor. Its accuracy for a modern memory
> hierarchy with three levels of cache, prefetching, non-LRU replacement, and
> who-knows-what-else is likely to be low. If you want to accurately know
> about cache behaviour you'd be much better off using hardware counters via
> `perf` or some other profiler.
> 
> But `--cache-sim=no` is still very useful because instruction execution
> counts are still very useful.
> 
> Therefore, I propose changing the default to `--cache-sim=no`. Does anyone
> have any objections to this?

I agree that simulating a cache from 2002 isn't very useful.

But then, what's the difference between `cachegrind --cache-sim=no`
and `callgrind`?

https://accu.org/journals/overload/20/111/floyd_1886/ says
"The main differences are that Callgrind has more information about the 
callstack whilst cachegrind gives more information about cache hit rates."

Wouldn't one want callstacks? (if this means stack traces).
I know I must be missing something, thanks for enlightening me.

-- 
David Faure, fa...@kde.org, http://www.davidfaure.fr
Working on KDE Frameworks 5





___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] RFC: changing Cachegrind default to `--cache-sim=no`

2023-04-03 Thread Nicholas Nethercote
Hi,

Cachegrind has an option `--cache-sim`.

If you run with `--cache-sim=yes` (the default) it tells it Cachegrind to
do a full cache simulation with lots of events: Ir, I1mr, ILmr, Dr, D1mr,
DLmr, Dw, D1mw, DLmw.

If you run with `--cache-sim=no` then the cache simulation is disabled and
you just get one event: Ir. (This is "instruction cache reads", which is
equivalent to "instructions executed".)

I have been using `--cache-sim=no` almost exclusively for a long time. The
cache simulation done by Valgrind is an approximation of the memory
hierarchy of a 2002 AMD Athlon processor. Its accuracy for a modern memory
hierarchy with three levels of cache, prefetching, non-LRU replacement, and
who-knows-what-else is likely to be low. If you want to accurately know
about cache behaviour you'd be much better off using hardware counters via
`perf` or some other profiler.

But `--cache-sim=no` is still very useful because instruction execution
counts are still very useful.

Therefore, I propose changing the default to `--cache-sim=no`. Does anyone
have any objections to this?

Thanks.

Nick
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users