Hi Gus Correa First of all, thanks for your suggestions.
1) The malloc function do return a non_NULL pointer. 2) I didn't tried the MPI_Isend function, actually, The really function I need to use is MPI_Allgatherv(). When I used it, I found this function didn't work when the the data >= 2GB, then I debugged it and found this function finally call the MPI_Send. 3) I have a large number of data need to train. so transfer the message >= 2GB is neccerary. Although I can divided the data into smaller, but I think the effciency will become lower too. Regards Xianjun Meng 2010/12/7 Gus Correa <g...@ldeo.columbia.edu> > Hi Xianjun > > Suggestions/Questions: > > 1) Did you check if malloc returns a non-NULL pointer? > Your program is assuming this, but it may not be true, > and in this case the problem is not with MPI. > You can print a message and call MPI_Abort if it doesn't. > > 2) Have you tried MPI_Isend/MPI_Irecv? > Or perhaps the buffered cousin MPI_Ibsend? > > 3) Why do you want to send these huge messages? > Wouldn't it be less of a trouble to send several > smaller messages? > > I hope it helps, > Gus Correa > > Xianjun wrote: > >> >> Hi >> >> Are you running on two processes (mpiexec -n 2)? >> Yes >> >> Have you tried to print Gsize? >> Yes, I had checked my codes several times, and I thought the errors came >> from the OpenMpi. :) >> >> The command line I used: >> "mpirun -hostfile ./Serverlist -np 2 ./test". The "Serverlist" file >> include several computers in my network. >> >> The command line that I used to build the openmpi-1.4.1: >> ./configure --enable-debug --prefix=/usr/work/openmpi ; make all install; >> >> What interconnect do you use? >> It is normal TCP/IP interconnect with 1GB network card. when I debugged my >> codes(and the openmpi codes), I found the openMpi do call the >> "mca_pml_ob1_send_request_start_rdma(...)" function, but I was not quite >> sure which protocal was used when transfer 2BG data. Do you have any >> opinions? Thanks >> >> Best Regards >> Xianjun Meng >> >> 2010/12/7 Gus Correa <g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu >> >> >> >> >> Hi Xianjun >> >> Are you running on two processes (mpiexec -n 2)? >> I think this code will deadlock for more than two processes. >> The MPI_Recv won't have a matching send for rank>1. >> >> Also, this is C, not MPI, >> but you may be wrapping into the negative numbers. >> Have you tried to print Gsize? >> It is probably -2147483648 in 32bit and 64bit machines. >> >> My two cents. >> Gus Correa >> >> Mike Dubman wrote: >> >> Hi, >> What interconnect and command line do you use? For InfiniBand >> openib component there is a known issue with large transfers (2GB) >> >> https://svn.open-mpi.org/trac/ompi/ticket/2623 >> >> try disabling memory pinning: >> >> http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned >> >> >> regards >> M >> >> >> 2010/12/6 <xjun.m...@gmail.com <mailto:xjun.m...@gmail.com> >> <mailto:xjun.m...@gmail.com <mailto:xjun.m...@gmail.com>>> >> >> >> >> hi, >> >> In my computers(X86-64), the sizeof(int)=4, but the >> sizeof(long)=sizeof(double)=sizeof(size_t)=8. when I checked my >> mpi.h file, I found that the definition about the sizeof(int) is >> correct. meanwhile, I think the mpi.h file was generated >> according >> to my compute environment when I compiled the Openmpi. So, my >> codes >> still don't work. :( >> >> Further, I found when I called the collective routines(such as, >> MPI_Allgatherv(...)) which are implemented by the Point 2 Point >> don't work either when the data > 2GB. >> >> Thanks >> Xianjun >> >> 2010/12/6 Tim Prince <n...@aol.com <mailto:n...@aol.com> >> <mailto:n...@aol.com <mailto:n...@aol.com>>> >> >> >> >> On 12/5/2010 7:13 PM, Xianjun wrote: >> >> hi, >> >> I met a question recently when I tested the MPI_send and >> MPI_Recv >> functions. When I run the following codes, the processes >> hanged and I >> found there was not data transmission in my network >> at all. >> >> BTW: I finished this test on two X86-64 computers >> with 16GB >> memory and >> installed Linux. >> >> 1 #include <stdio.h> >> 2 #include <mpi.h> >> 3 #include <stdlib.h> >> 4 #include <unistd.h> >> 5 >> 6 >> 7 int main(int argc, char** argv) >> 8 { >> 9 int localID; >> 10 int numOfPros; >> 11 size_t Gsize = (size_t)2 * 1024 * 1024 * 1024; >> 12 >> 13 char* g = (char*)malloc(Gsize); >> 14 >> 15 MPI_Init(&argc, &argv); >> 16 MPI_Comm_size(MPI_COMM_WORLD, &numOfPros); >> 17 MPI_Comm_rank(MPI_COMM_WORLD, &localID); >> 18 >> 19 MPI_Datatype MPI_Type_lkchar; >> 20 MPI_Type_contiguous(2048, MPI_BYTE, >> &MPI_Type_lkchar); >> 21 MPI_Type_commit(&MPI_Type_lkchar); >> 22 >> 23 if (localID == 0) >> 24 { >> 25 MPI_Send(g, 1024*1024, MPI_Type_lkchar, 1, 1, >> MPI_COMM_WORLD); >> 26 } >> 27 >> 28 if (localID != 0) >> 29 { >> 30 MPI_Status status; >> 31 MPI_Recv(g, 1024*1024, MPI_Type_lkchar, 0, 1, \ >> 32 MPI_COMM_WORLD, &status); >> 33 } >> 34 >> 35 MPI_Finalize(); >> 36 >> 37 return 0; >> 38 } >> >> You supplied all your constants as 32-bit signed data, >> so, even >> if the count for MPI_Send() and MPI_Recv() were a larger >> data >> type, you would see this limit. Did you look at your >> <mpi.h> ? >> >> -- Tim Prince >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >