On 09/11/2014 03:30 AM, Krzysztof Czarnowski wrote:
> I've already posted this question to StackOverflow as
> http://stackoverflow.com/questions/25780834/tool-to-identify-apps-data-code-most-susceptible-to-memory-performance
> (copied below at the end of the message for convenience).
> 
> But since developing a specialized Valgrind tool looks to to me like
> the best way to go to get an "ultimate" solution, I decided to ask
> here. Obviously the amount of required effort is a concern...
> 
> Any advice and any pointers to existing, even partial solutions, good
> examples, starting points, whatever, very welcome.
> 
> Regards,
> Krzysztof
> 
> 
> ----------------------------
> http://stackoverflow.com/questions/25780834/tool-to-identify-apps-data-code-most-susceptible-to-memory-performance
> 
> Context:
> -- embedded platform running Linux with some static RAM which is
> declared about 3 times faster then the rest of RAM (dynamic). The
> amount of this fast memory is 512kB and the official name is eSRAM.
> (Details not important for this post: Galileo board, information on
> eSRAM and relevant kernel API:
> https://communities.intel.com/servlet/JiveServlet/previewBody/22488-102-1-26046/Quark_SWDevManLx_330235_001.pdf)
> -- eSRAM can be used by an application with some support from the
> kernel---a simple driver that allocates kernel memory on its behalf,
> overlays the memory with eSRAM (this is done in physical space) and
> mmaps it to app's virtual memory space. This was tested and confirmed
> to work as expected.
> 
> Problem:
> Identify which sections of app's data (and possibly code) to map into
> eSRAM to achieve optimum performance gain. A suitable analysis tool is
> required.
> 
> After some search I'm not sure if any existing tool is actually suited
> to this task. Currently my best bet is to develop a specialized
> Valgrind tool. But maybe there is already something in the ecosystem
> to start with. Any advice/information is welcome even if, for
> instance, a tool is kind of partially suited etc.
I suggest trying the 'perf record' tool to collect a profile of data cache 
events, and use the "--data" option to also collect the sample data addresses.  
Then use 'perf report --raw" to generate a report that would show both the 
instruction address of where each sample was taken, as well as the data address 
being accessed.  Collecting both data cache refs and misses could be useful . . 
. the number of samples for those events for a given range of memory could be 
used to calculate a miss score for that segment of memory.  Parsing the raw 
output of perf report may be quite painful, though.  Alternatively, you could 
write a simple profiling tool of your own that uses the perf_event_open kernel 
API to record exactly the information you need to a file; and then write a 
post-processing program to analyze that recorded data.

-Maynard
> 
> P.S.
> Full analysis should probably take a lot of factors into account, like:
> -- memory access patterns (cache performance)
> -- changes over time (one could consider eSRAM paging)
> ...
> 
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Valgrind-users mailing list
> Valgrind-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
> 


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to