Hi Brock,
On Monday 26 October 2009 03:23:42 pm Brock Palen wrote:
> Is there a large overhead for --enable-debug --enable-memchecker?
> 
> reading:
> http://www.open-mpi.org/faq/?category=debugging
> 
> It sounds like there is and there isn't, what should I expect if we
> build all of our mpi libraries with those options, when we run normally:
> 
> mpirun ./myexe
> 
> vs using a library that was not built with those options?
This may be too verbose an answer ,-)

Now while --enable-debug adds quite a bit of overhead, due to various internal 
runtime checks being introduced into code-path (e.g. for every opal-object 
checks of a magic-id, whether this really is a proper object, checking the 
reference-counter and keeping the src-file and line-number of the 
constructor).
How "bad" --enable-debug is really depends on Your communication pattern and 
the setup, e.g. shared memory communication latency suffers most.


To make usage of memchecker and the best of valgrind, You don't actually need 
--enable-debug, depending on Your setup:
 - For user-apps debugging (checking, whether buffers given to MPI are 
initialized, whether data returned by MPI may be accessed, etc.) 
The user app of course should be compiled with debugging on ("-g").

 - To get valgrind-output of OMPI-internal data-structures including source-
location of undefined memory of You'd want to compile OMPI with --enable-debug 
(or at least with -g and without optimization) and furthermore define 
OMPI_WANT_MEMCHECKER_MPI_OBJECTS in ompi/include/ompi/memchecker to check the 
initialization of OMPI's MPI_Comm/datatypes and others. This however is mostly 
for OMPI-developers..


Per overhead:
- The latency of running an application with libmpi compiled with memchecker 
when _not_ running under valgrind (3-6% over IB-DDR using IMB), while 
bandwidth is hardly influenced.
- When doing the OMPI-internal MPI-object checking, it _does_ become very 
costly due to the many client-requests issued using valgrind's API (but as 
noted this is for OMPI-developers, anyway).
Please see http://www.open-mpi.org/papers/parco-2007/ for more information.

With the NPB benchmark, we did not find any performance implications with the 
instrumentation added when not run under valgrind.


Now when running the application under valgrind, the expected slow-down of the 
valgrind's memcheck come into effect...


So, the most flexible way is to provide two versions and let users decide per 
modulefile with a verbose proc ModulesHelp...


With best regards,
Rainer
-- 
------------------------------------------------------------------------
Rainer Keller, PhD                  Tel: +1 (865) 241-6293
Oak Ridge National Lab          Fax: +1 (865) 241-4811
PO Box 2008 MS 6164           Email: kel...@ornl.gov
Oak Ridge, TN 37831-2008    AIM/Skype: rusraink

Reply via email to