users-boun...@open-mpi.org schrieb am 08/05/2008 05:51:51 PM: > Jan, > > I'm using valgrind with Open MPI on a [very] regular basis and I never > had any problems. I usually want to know the execution path on the MPI > applications. For this I use: > mpirun -np XX valgrind --tool=callgrind -q --log-file=some_file ./my_app > > I just run your example: > mpirun -np 2 -bynode --mca btl tcp,self valgrind --tool=massif - > q ./NPmpi -u 4 > and I got 2 non empty files in the current directory: > bosilca@dancer:~/NetPIPE_3.6.2$ ls -l massif.out.* > -rw------- 1 bosilca bosilca 140451 2008-08-05 11:57 massif.out. > 21197 > -rw------- 1 bosilca bosilca 131471 2008-08-05 11:57 massif.out. > 21210
George, Thanks for the info - which version of OpenMPI, compiler and valgrind did you try with? I checked in two different clusters with OpenMPI 1.2.4 compiled with two different versions of the PGI compiler and valgrind 3.3.1, with the same bad result. I also noticed that the MPI processes despite of producing the expected output do not terminate cleanly. I can see in the stderr log (for each process): ==7909== Warning: client syscall munmap tried to modify addresses 0xD1968F92A19A72D1-0x34324E6F ==7909== ==7909== Process terminating with default action of signal 11 (SIGSEGV) ==7909== Access not within mapped region at address 0x8053D8000 ==7909== at 0x5284996: _int_free (in /opt/openmpi-1.2.4/lib/libopen-pal.so.0.0.0) ==7909== by 0x52837A7: free (in /opt/openmpi-1.2.4/lib/libopen-pal.so.0.0.0) ==7909== by 0x593C76A: free_mem (in /lib64/libc-2.4.so) ==7909== by 0x593C3E1: __libc_freeres (in /lib64/libc-2.4.so) ==7909== by 0x491D31C: _vgnU_freeres (vg_preloaded.c:60) ==7909== by 0x587D1C4: exit (in /lib64/libc-2.4.so) ==7909== by 0x586815A: (below main) (in /lib64/libc-2.4.so) That probably explains why my massif.out.* are empty (<200 bytes long), but why do the processes crash? The same program runs ok with valgrind+MVAPICH or with OpenMPI without valgrind in their respective clusters. I experience this both with a simple test program and with a real application (WRF). Regards, Jan Ploski