Do you get a corefile?

It looks like you're calling MPI_RECV in Fortran and then it segv's.  This is 
*likely* because you're either passing a bad parameter or your buffer isn't big 
enough.  Can you double check all your parameters?

Unfortunately, there's no line numbers printed in the stack trace, so it's not 
possible to tell exactly where in the ob1 PML it's dying (i.e., so we can't see 
exactly what it's doing to cause the segv).



On Dec 2, 2010, at 9:36 AM, Benjamin Toueg wrote:

> Hi,
> 
> I am using DRAGON, a neutronic simulation code in FORTRAN77 that has its own 
> datastructures. I added a module to send these data structures thanks to 
> MPI_SEND / MPI_RECEIVE, and everything worked perfectly for a while.
> 
> Then I had to raise the number of data structures to be sent up to a point 
> where my cluster has this bug :
> *** Process received signal ***
> Signal: Segmentation fault (11)
> Signal code: Address not mapped (1)
> Failing at address: 0x2c2579fc0
> [ 0] /lib/libpthread.so.0 [0x7f52d2930410]
> [ 1] /home/toueg/openmpi/lib/openmpi/mca_pml_ob1.so [0x7f52d153fe03]
> [ 2] /home/toueg/openmpi/lib/libmpi.so.0(PMPI_Recv+0x2d2) [0x7f52d3504a1e]
> [ 3] /home/toueg/openmpi/lib/libmpi_f77.so.0(pmpi_recv_+0x10e) 
> [0x7f52d36cf9c6]
> 
> How can I make this error more explicit ?
> 
> I use the following configuration of openmpi-1.4.3 :
> ./configure --enable-debug --prefix=/home/toueg/openmpi CXX=g++ CC=gcc 
> F77=gfortran FC=gfortran FLAGS="-m64 -fdefault-integer-8 -fdefault-real-8 
> -fdefault-double-8" FCFLAGS="-m64 -fdefault-integer-8 -fdefault-real-8 
> -fdefault-double-8" --disable-mpi-f90
> 
> Here is the output of mpif77 -v :
> mpif77 for 1.2.7 (release) of : 2005/11/04 11:54:51
> Driving: f77 -L/usr/lib/mpich-mpd/lib -v -lmpich-p4mpd -lpthread -lrt 
> -lfrtbegin -lg2c -lm -shared-libgcc
> Lecture des spécification à partir de 
> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/specs
> Configuré avec: ../src/configure -v --enable-languages=c,c++,f77,pascal 
> --prefix=/usr --libexecdir=/usr/lib 
> --with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared 
> --with-system-zlib --enable-nls --without-included-gettext 
> --program-suffix=-3.4 --enable-__cxa_atexit --enable-clocale=gnu 
> --enable-libstdcxx-debug x86_64-linux-gnu
> Modèle de thread: posix
> version gcc 3.4.6 (Debian 3.4.6-5)
>  /usr/lib/gcc/x86_64-linux-gnu/3.4.6/collect2 --eh-frame-hdr -m elf_x86_64 
> -dynamic-linker /lib64/ld-linux-x86-64.so.2 
> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crt1.o 
> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crti.o 
> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/crtbegin.o -L/usr/lib/mpich-mpd/lib 
> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6 -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6 
> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib 
> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../.. -L/lib/../lib 
> -L/usr/lib/../lib -lmpich-p4mpd -lpthread -lrt -lfrtbegin -lg2c -lm -lgcc_s 
> -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/x86_64-linux-gnu/3.4.6/crtend.o 
> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crtn.o
> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/libfrtbegin.a(frtbegin.o):
>  dans la fonction ▒ main ▒:
> (.text+0x1e): référence indéfinie vers ▒ MAIN__ ▒
> collect2: ld a retourné 1 code d'état d'exécution
> 
> Thanks,
> Benjamin
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to