Hi all,

I've run a few OpenMP-based multi-threaded applications with
Valgrind/Callgrind on an ARMv8-based server, and found that increasing the
number of threads has increased the per-thread IR, memory operations, and
integer operations, while floating point operations scaled correctly (as
displayed by the stats provided by Callgrind). In general, the native
behavior on the server is: increase the number of threads, reduce the
amount of memory ops, and instructions retired per thread, which I've
observed on the host machine's hardware counters.

On the documentation for Helgrind, I read that Linux futuxes may be causing
some quirky runtime behavior in Valgrind, so I recompiled gcc with linux
futexes disabled. I found that the per-thread IR, etc did indeed reduce,
but that tons of mutex locks were used for every barrier.

Does anyone know if this is the normal behavior? Is there a solution that
allows us to use the native run-time support library of GNU OpenMP when
using Valgrind-based tools?

For reference, I tested this with gcc 4.9.2 and Valgrind 3.10.1.

Thanks,

Karthik
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to