On 5/27/20 Paul FLOYD wrote:
Well, no real surprises. This is with a testcase that runs standalone in about
5 seconds and under DHAT in about 200 seconds (so a reasonable slowdown of 40x).
# Overhead Command Shared Object
Symbol
# ........ ............... ..................
................................................................................................................................................................................................................
#
29.11% dhat-amd64-linu dhat-amd64-linux [.] interval_tree_Cmp
21.13% dhat-amd64-linu perf-26905.map [.] 0x00000010057a25f8
13.32% dhat-amd64-linu dhat-amd64-linux [.] vgPlain_lookupFM
9.56% dhat-amd64-linu dhat-amd64-linux [.] dh_handle_read
8.83% dhat-amd64-linu dhat-amd64-linux [.] vgPlain_nextIterFM
4.66% dhat-amd64-linu dhat-amd64-linux [.] check_for_peak
1.85% dhat-amd64-linu dhat-amd64-linux [.] vgPlain_disp_cp_xindir
1.32% dhat-amd64-linu [kernel.kallsyms] [k] 0xffffffff8103ec0a
1.00% dhat-amd64-linu dhat-amd64-linux [.] dh_handle_write
To me this suggests two things:
1) investigate the coding of the 4 or 5 highest-use subroutines
(interval_tree_Cmp,
vgPlain_lookupFM, dh_handle_read, vgPlain_nextIterFM)
2) see whether DHAT might recognize and use higher-level abstractions
than MemoryRead and MemoryWrite of individual addresses. Similar to memcheck
intercepting and analyzing strlen (etc.) as a complete concept instead of as its
individual Reads and Writes, perhaps DHAT could intercept (and/or recognize)
vector linear search, vector addition, vector partial sum, other BLAS routines,
etc.,
and then analyze the algorithm as a whole.
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users