Hi Julian, Thank you for your response. Here is some additional information. I apologize that this is going to be a long message, I'm trying to provide everything relevant to the problem.
I think I'm following the instructions as documented in the MPI section on the valgrind web page. However, the fact that I don't see the valgrind banner when I run an MPI application suggests that I'm missing something. The only messages I get are those from the MPI wrappers from valgrind. Detailed responses to your questions are in line below. Thank you very much for any information you can provide! On 01/21/2013 03:49 PM, Raghu Reddy wrote: >> I was wondering if anyone has successfully used valgrind with MPI >> applications on SGI systems with MPT? > > I don't know about on SGI w/ MPT (whatever MPT is). But for sure in general on MPI, it works. The SGI implementation of MPI is called MPT (message passing toolkit). So it is simply another implementation of MPI and conforms to the MPI2 standard. >> Using a non-MPI program (the simple example from the valgrind website >> tutorial) works exactly as documented. However, an MPI hello world >> example with the same error does not point out the error, even though >> there are messages from the MPI wrappers. > > Does your MPI hello world test work as expected (with -DBUG) if you remove the MPI specifics and just run it as an > ordinary executable? Without Valgrind: ============= If I compile my MPI example code with -DBUG (without linking with valgrind), and launch it with MPI launcher (on the SGI Systems it is not possible to run an MPI program without using the MPI launcher) the program runs to completion even though it has a bug (the complete code was included in the original message; I wasn't sure if it was appropriate to include it again for completeness): r31i2n2% mpicc -DBUG -g -O0 -o hello_mpi_c hello_mpi_c.c r31i2n2% mpiexec_mpt -np 4 ./hello_mpi_c Hello from rank 0 out of 4; procname = r31i2n2 Print something 0 Hello from rank 1 out of 4; procname = r31i2n2 Print something 0 Hello from rank 2 out of 4; procname = r31i2n2 Print something 0 Hello from rank 3 out of 4; procname = r31i2n2 Print something 0 r31i2n2% If I make it a serial program by stripping out all MPI, I can execute the program as a serial program, and it runs to completion if (even though there is a bug): r31i2n2% m mem-bug.c #include <stdlib.h> void f(void) { int* x = malloc(10 * sizeof(int)); x[10] = 0; // problem 1: heap block overrun } // problem 2: memory leak -- x not freed int main(void) { f(); return 0; } r31i2n2% r31i2n2% icc -o mem-bug -debug mem-bug.c r31i2n2% r31i2n2% ./mem-bug r31i2n2% With valgrind: ========== The serial program with no MPI, when launched with valgrind, it does point to the error and valgrind is working as expected: r31i2n2% /contrib/valgrind/valgrind-3.8.1/bin/valgrind ./mem-bug ==9806== Memcheck, a memory error detector ==9806== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==9806== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==9806== Command: ./mem-bug ==9806== ==9806== Invalid write of size 4 ==9806== at 0x40051E: f (mem-bug.c:6) ==9806== by 0x40052E: main (mem-bug.c:11) ==9806== Address 0x5a70068 is 0 bytes after a block of size 40 alloc'd ==9806== at 0x4C278FE: malloc (vg_replace_malloc.c:270) ==9806== by 0x400508: f (mem-bug.c:5) ==9806== by 0x40052E: main (mem-bug.c:11) ==9806== ==9806== ==9806== HEAP SUMMARY: ==9806== in use at exit: 40 bytes in 1 blocks ==9806== total heap usage: 1 allocs, 0 frees, 40 bytes allocated ==9806== ==9806== LEAK SUMMARY: ==9806== definitely lost: 40 bytes in 1 blocks ==9806== indirectly lost: 0 bytes in 0 blocks ==9806== possibly lost: 0 bytes in 0 blocks ==9806== still reachable: 0 bytes in 0 blocks ==9806== suppressed: 0 bytes in 0 blocks ==9806== Rerun with --leak-check=full to see details of leaked memory ==9806== ==9806== For counts of detected and suppressed errors, rerun with: -v ==9806== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4) r31i2n2% But the problem is I am unable to get valgrind to point out the problem in the MPI code. The output from that run is included below (if it is all right, I will include the source code also): r31i2n2% m hello_mpi_c.c #include <stdio.h> #include <mpi.h> int main(int argc, char **argv) { int ierr, myid, npes; int len; char name[MPI_MAX_PROCESSOR_NAME]; ierr = MPI_Init(&argc, &argv); #ifdef MACROTEST #define MACROTEST 10 #endif ierr = MPI_Comm_rank(MPI_COMM_WORLD, &myid); ierr = MPI_Comm_size(MPI_COMM_WORLD, &npes); ierr = MPI_Get_processor_name( name, &len ); printf("Hello from rank %d out of %d; procname = %s\n", myid, npes, name); #ifdef MACROTEST printf("Test Macro: %d\n", MACROTEST); #endif #ifdef BUG { int* x = (int*)malloc(10 * sizeof(int)); x[10] = 0; // problem 1: heap block overrun printf("Print something %d\n",x[10]); } // problem 2: memory leak -- x not freed #endif ierr = MPI_Finalize(); } r31i2n2% mpicc -DBUG -g -O0 -o hello_mpi_c hello_mpi_c.c /contrib/valgrind/valgrind-3.8.1/lib/valgrind/libmpiwrap-amd64-linux.so r31i2n2% r31i2n2% env MPIWRAP_DEBUG=verbose mpiexec_mpt -np 1 /contrib/valgrind/valgrind-3.8.1/bin/valgrind ./hello_mpi_c valgrind MPI wrappers 9993: Active for pid 9993 valgrind MPI wrappers 9993: Try MPIWRAP_DEBUG=help for possible options valgrind MPI wrappers 9993: enter PMPI_Init valgrind MPI wrappers 9993: enter PMPI_Init_thread valgrind MPI wrappers 9993: exit PMPI_Init (err = 0) valgrind MPI wrappers 9993: enter PMPI_Comm_rank valgrind MPI wrappers 9993: exit PMPI_Comm_rank (err = 0) valgrind MPI wrappers 9993: enter PMPI_Comm_size valgrind MPI wrappers 9993: exit PMPI_Comm_size (err = 0) valgrind MPI wrappers 9993: enter PMPI_Get_processor_name Hello from rank 0 out of 1; procname = r31i2n2 Print something 0 valgrind MPI wrappers 9993: enter PMPI_Finalize valgrind MPI wrappers 9993: exit PMPI_Finalize (err = 0) r31i2n2% -----Original Message----- From: Julian Seward [mailto:jsew...@acm.org] Sent: Tuesday, January 29, 2013 4:33 AM To: Raghu Reddy Cc: Valgrind-users@lists.sourceforge.net Subject: Re: [Valgrind-users] Is it possible to use valgrind with MPI applications (with SGI MPT)? On 01/21/2013 03:49 PM, Raghu Reddy wrote: > I was wondering if anyone has successfully used valgrind with MPI > applications on SGI systems with MPT? I don't know about on SGI w/ MPT (whatever MPT is). But for sure in general on MPI, it works. > Using a non-MPI program (the simple example from the valgrind website > tutorial) works exactly as documented. However, an MPI hello world > example with the same error does not point out the error, even though > there are messages from the MPI wrappers. Does your MPI hello world test work as expected (with -DBUG) if you remove the MPI specifics and just run it as an ordinary executable? J ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users