> I'm also getting TONS of "uninitialized value" errors with HP-MPI that I > never got before (and some of which I have carefully tracked down, and they > are bogus, the values are clearly initialized), but that is another > issue....
Debugging MPI apps with Valgrind is a bit tricky, but it's certainly doable. What you describe sounds like it could have three possible causes. You'll have to investigate. 1. MPI implementations like to map the network card(s) into user space, and push data through them, bypassing the kernel. In this situation Memcheck hasn't got a clue what's going on, especially for data arriving at a node, and you get flooded with false errors. 2. For similar reasons, if two processes on the same node communicate via a shared memory region, the results will be bad. 3. It may be that OpenMPI is providing its own implementations of malloc, free, new, delete, etc; that Memcheck doesn't know about, which will also cause chaos. Re (1) and (2), it helps if you can get OpenMPI to make all processes (even those on the same node) communicate via standard TCP/IP networking, so that Memcheck can see data going into/out-of each process correctly. Some time ago I was told by an OpenMPI developer that this can be done by passing --mca btl tcp,self to mpirun. Re (3), that's more difficult to ascertain. I suggest also that asking the OpenMPI developers is worthwhile. I've found them in the past to be knowledgeable and helpful, and I believe they are long-time users of Valgrind/Memcheck. I believe you should be able to get to essentially zero false errors with a suitable OpenMPI configuration. I managed that in my testing with OpenMPI a couple of years back, although I should say that was very limited testing. Since you are upgrading from Valgrind 2.2, once you achieve a clean run, you might want to consider using Memcheck's MPI-checking wrapper library for extra validation at the PMPI_* function interface level, if you haven't already discovered that. See http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap for details. I wouldn't recommend this before you get a clean run, though; the results will be confusing. J ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users