On 5/27/20 Paul FLOYD wrote:

Well, no real surprises. This is with a testcase that runs standalone in about 
5 seconds and under DHAT in about 200 seconds (so a reasonable slowdown of 40x).

# Overhead          Command       Shared Object
Symbol
# ........  ...............  ..................  
................................................................................................................................................................................................................
#
    29.11%  dhat-amd64-linu  dhat-amd64-linux    [.] interval_tree_Cmp
    21.13%  dhat-amd64-linu  perf-26905.map      [.] 0x00000010057a25f8
    13.32%  dhat-amd64-linu  dhat-amd64-linux    [.] vgPlain_lookupFM
     9.56%  dhat-amd64-linu  dhat-amd64-linux    [.] dh_handle_read
     8.83%  dhat-amd64-linu  dhat-amd64-linux    [.] vgPlain_nextIterFM
     4.66%  dhat-amd64-linu  dhat-amd64-linux    [.] check_for_peak
     1.85%  dhat-amd64-linu  dhat-amd64-linux    [.] vgPlain_disp_cp_xindir
     1.32%  dhat-amd64-linu  [kernel.kallsyms]   [k] 0xffffffff8103ec0a
     1.00%  dhat-amd64-linu  dhat-amd64-linux    [.] dh_handle_write

To me this suggests two things:

1) investigate the coding of the 4 or 5 highest-use subroutines 
(interval_tree_Cmp,
vgPlain_lookupFM, dh_handle_read, vgPlain_nextIterFM)

2) see whether DHAT might recognize and use higher-level abstractions
than MemoryRead and MemoryWrite of individual addresses.  Similar to memcheck
intercepting and analyzing strlen (etc.) as a complete concept instead of as its
individual Reads and Writes, perhaps DHAT could intercept (and/or recognize)
vector linear search, vector addition, vector partial sum, other BLAS routines, 
etc.,
and then analyze the algorithm as a whole.


_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to