Re: [Valgrind-users] How to make use of callgrind --cacheuse=yes ?

Josef Weidendorfer Sun, 01 Feb 2015 07:47:00 -0800

Hi Steve,

> I'm looking for a way to answer the question 'how much of a cache line did I 
> use before discarding it?'.
> 
> ...
> 
>  g++ matrix.cpp -O0 -g


For performance analysis, you always should work with optimized code,
ie. "-O2 -g"
(actually, that doesn't change the data layout, so AcCost/SpLoss will be
the same).
But it may show different source lines because of inlining.

I just tried it out: with "-O0", all of AcCostX and SpLossX is shown in
"Data::operator int() const", which makes sense, as this accesses the
matrix.

With "-O2", everything is in main, and the source code panel shows that
everything is attributed to the line 59 with "sum += vec[j]". This is what
you also expect, or?

>  valgrind --tool=callgrind --cacheuse=yes --instr-atstart=no ./a.out
> 
> The output shows a large value for AcCost1 because of the padding in the 
> Data struct,

Actually, I think it makes more sense to compare the "spatial loss metrices"
with the number of data loaded into a given cache, for example for L2,
compare 64*DLmr (this is the amount of data loaded from main memory, as
there
are DLmr read misses and one miss accounts for 64 bytes to load), and
SpLoss2
(should be SpLossL), which shows "how many bytes never were used before
evicted".

In kcachegrind / qcachegrind, you can add a new derived event type "DLmr64"
as being "64 DLmr", by using the context menu in the event type panel
("add new
event type").

In your case, "64 * DLmr" is 10 242 624, and SpLoss2 is 9 508 008. That is,
from the data loaded, 92% is never accesses. Which obviously is bad...

> but kcachegrind doesn't point me to the loop as the cause of
> loading that struct and using so little of it.

That is actually a missing feature, and has to do with the complexity of
implementing "inclusive" aggregation up the call stack for AcCost/SpLoss.
As you see, the flat profile has only "exclusive" costs. And then of course,
the call graph visulization is not existing, as it uses inclusive cost for
the graph edges as base.

With "-O2", this aggregation is not needed, and points to the loop :-)
With "-O0", it would work if implemented...

The complexity comes from the fact that the simulator can detect
AcCost/SpLoss
only at eviction time of the cacheline, but this cost gets related to
the access which loaded this lines into the cache (which makes sense in
my opinion).
Currently, when a cacheline is loaded, I only remember the corresponding
counters
which need to be updated afterwards, but for aggregation I need to also
remember the call stack, and update counters up the call stack at
cacheline eviction
time. No problem per se, but not implemented.

Actually, what I wanted to add at some point is miss/hit counters per data
structure, and this would give also better insight for AcCost/SpLoss.

> Is there something I'm missing here? Is the workflow of making use of the 
> option --cacheuse=yes different than I assume?

No. I hope above explains your observations...

Josef

PS: a padding of "64-sizeof(int)" is enough to see the same effect.

> 
> Thanks,
> 
> Steve.
> 
> 
> 
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Valgrind-users mailing list
> Valgrind-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
> 

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] How to make use of callgrind --cacheuse=yes ?

Reply via email to