> I'm also getting TONS of "uninitialized value" errors with HP-MPI that I
> never got before (and some of which I have carefully tracked down, and they
> are bogus, the values are clearly initialized), but that is another
> issue....

Debugging MPI apps with Valgrind is a bit tricky, but it's certainly doable.

What you describe sounds like it could have three possible causes.  You'll
have to investigate.

1. MPI implementations like to map the network card(s) into user space,
   and push data through them, bypassing the kernel.  In this situation
   Memcheck hasn't got a clue what's going on, especially for data 
   arriving at a node, and you get flooded with false errors.

2. For similar reasons, if two processes on the same node communicate
   via a shared memory region, the results will be bad.

3. It may be that OpenMPI is providing its own implementations of
   malloc, free, new, delete, etc; that Memcheck doesn't know about,
   which will also cause chaos.

Re (1) and (2), it helps if you can get OpenMPI to make all processes
(even those on the same node) communicate via standard TCP/IP networking,
so that Memcheck can see data going into/out-of each process correctly.
Some time ago I was told by an OpenMPI developer that this can be done
by passing --mca btl tcp,self to mpirun.

Re (3), that's more difficult to ascertain.

I suggest also that asking the OpenMPI developers is worthwhile.  I've
found them in the past to be knowledgeable and helpful, and I believe they
are long-time users of Valgrind/Memcheck.

I believe you should be able to get to essentially zero false errors
with a suitable OpenMPI configuration.  I managed that in my testing with
OpenMPI a couple of years back, although I should say that was very
limited testing.

Since you are upgrading from Valgrind 2.2, once you achieve a clean run,
you might want to consider using Memcheck's MPI-checking wrapper library
for extra validation at the PMPI_* function interface level, if you haven't
already discovered that.  See 
http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap
for details.  I wouldn't recommend this before you get a clean run,
though; the results will be confusing.

J

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to