Hello.
It looks like you allocate memory in every loop iteration on process #0
and doesn't free it so malloc fails on some iteration.

В Вск, 28/02/2010 в 19:22 +0100, TRINH Minh Hieu пишет:
> Hello,
> 
> I have some problems running MPI on my heterogeneous cluster. More
> precisley i got segmentation fault when sending a large array (about
> 10000) of double from a i686 machine to a x86_64 machine. It does not
> happen with small array. Here is the send/recv code source (complete
> source is in attached file) :
> ========code ================
>     if (me == 0 ) {
>       for (int pe=1; pe<nprocs; pe++)
>       {
>               printf("Receiving from proc %d : ",pe); fflush(stdout);
>           d=(double *)malloc(sizeof(double)*n);
>           MPI_Recv(d,n,MPI_DOUBLE,pe,999,MPI_COMM_WORLD,&status);
>           printf("OK\n"); fflush(stdout);
>       }
>       printf("All done.\n");
>     }
>     else {
>       d=(double *)malloc(sizeof(double)*n);
>       MPI_Send(d,n,MPI_DOUBLE,0,999,MPI_COMM_WORLD);
>     }
> ======== code ================
> 
> I got segmentation fault with n=10000 but no error with n=1000
> I have 2 machines :
> sbtn155 : Intel Xeon,         x86_64
> sbtn211 : Intel Pentium 4, i686
> 
> The code is compiled in x86_64 and i686 machine, using OpenMPI 1.4.1,
> installed in /tmp/openmpi :
> [mhtrinh@sbtn211 heterogenous]$ make hetero
> gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o hetero.i686.o
> /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include
> hetero.i686.o -o hetero.i686 -lm
> 
> [mhtrinh@sbtn155 heterogenous]$ make hetero
> gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o 
> hetero.x86_64.o
> /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include
> hetero.x86_64.o -o hetero.x86_64 -lm
> 
> I run with the code using appfile and got thoses error :
> $ cat appfile
> --host sbtn155 -np 1 hetero.x86_64
> --host sbtn155 -np 1 hetero.x86_64
> --host sbtn211 -np 1 hetero.i686
> 
> $ mpirun -hetero --app appfile
> Input array length :
> 10000
> Receiving from proc 1 : OK
> Receiving from proc 2 : [sbtn155:26386] *** Process received signal ***
> [sbtn155:26386] Signal: Segmentation fault (11)
> [sbtn155:26386] Signal code: Address not mapped (1)
> [sbtn155:26386] Failing at address: 0x200627bd8
> [sbtn155:26386] [ 0] /lib64/libpthread.so.0 [0x3fa4e0e540]
> [sbtn155:26386] [ 1] /tmp/openmpi/lib/openmpi/mca_pml_ob1.so [0x2aaaad8d7908]
> [sbtn155:26386] [ 2] /tmp/openmpi/lib/openmpi/mca_btl_tcp.so [0x2aaaae2fc6e3]
> [sbtn155:26386] [ 3] /tmp/openmpi/lib/libopen-pal.so.0 [0x2aaaaafe39db]
> [sbtn155:26386] [ 4]
> /tmp/openmpi/lib/libopen-pal.so.0(opal_progress+0x9e) [0x2aaaaafd8b9e]
> [sbtn155:26386] [ 5] /tmp/openmpi/lib/openmpi/mca_pml_ob1.so [0x2aaaad8d4b25]
> [sbtn155:26386] [ 6] /tmp/openmpi/lib/libmpi.so.0(MPI_Recv+0x13b)
> [0x2aaaaab30f9b]
> [sbtn155:26386] [ 7] hetero.x86_64(main+0xde) [0x400cbe]
> [sbtn155:26386] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3fa421e074]
> [sbtn155:26386] [ 9] hetero.x86_64 [0x400b29]
> [sbtn155:26386] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 26386 on node sbtn155
> exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> Am I missing an option in order to run in heterogenous cluster ?
> MPI_Send/Recv have limit array size when using heterogeneous cluster ?
> Thanks for your help. Regards
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Kind regards,
Timur Magomedov
Senior C++ Developer
DevelopOnBox LLC / Zodiac Interactive
http://www.zodiac.tv/

Reply via email to