Hi Steve, > I'm looking for a way to answer the question 'how much of a cache line did I > use before discarding it?'. > > ... > > g++ matrix.cpp -O0 -g
For performance analysis, you always should work with optimized code, ie. "-O2 -g" (actually, that doesn't change the data layout, so AcCost/SpLoss will be the same). But it may show different source lines because of inlining. I just tried it out: with "-O0", all of AcCostX and SpLossX is shown in "Data::operator int() const", which makes sense, as this accesses the matrix. With "-O2", everything is in main, and the source code panel shows that everything is attributed to the line 59 with "sum += vec[j]". This is what you also expect, or? > valgrind --tool=callgrind --cacheuse=yes --instr-atstart=no ./a.out > > The output shows a large value for AcCost1 because of the padding in the > Data struct, Actually, I think it makes more sense to compare the "spatial loss metrices" with the number of data loaded into a given cache, for example for L2, compare 64*DLmr (this is the amount of data loaded from main memory, as there are DLmr read misses and one miss accounts for 64 bytes to load), and SpLoss2 (should be SpLossL), which shows "how many bytes never were used before evicted". In kcachegrind / qcachegrind, you can add a new derived event type "DLmr64" as being "64 DLmr", by using the context menu in the event type panel ("add new event type"). In your case, "64 * DLmr" is 10 242 624, and SpLoss2 is 9 508 008. That is, from the data loaded, 92% is never accesses. Which obviously is bad... > but kcachegrind doesn't point me to the loop as the cause of > loading that struct and using so little of it. That is actually a missing feature, and has to do with the complexity of implementing "inclusive" aggregation up the call stack for AcCost/SpLoss. As you see, the flat profile has only "exclusive" costs. And then of course, the call graph visulization is not existing, as it uses inclusive cost for the graph edges as base. With "-O2", this aggregation is not needed, and points to the loop :-) With "-O0", it would work if implemented... The complexity comes from the fact that the simulator can detect AcCost/SpLoss only at eviction time of the cacheline, but this cost gets related to the access which loaded this lines into the cache (which makes sense in my opinion). Currently, when a cacheline is loaded, I only remember the corresponding counters which need to be updated afterwards, but for aggregation I need to also remember the call stack, and update counters up the call stack at cacheline eviction time. No problem per se, but not implemented. Actually, what I wanted to add at some point is miss/hit counters per data structure, and this would give also better insight for AcCost/SpLoss. > Is there something I'm missing here? Is the workflow of making use of the > option --cacheuse=yes different than I assume? No. I hope above explains your observations... Josef PS: a padding of "64-sizeof(int)" is enough to see the same effect. > > Thanks, > > Steve. > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Valgrind-users mailing list > Valgrind-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/valgrind-users > ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users