Edgar, I merged the changes you did from -r17848:17849 in the trunk to OpenMPI version 1.2.6rc2 with George's patch and my small examples now work.
Martin ________________________________________ De : users-boun...@open-mpi.org [users-boun...@open-mpi.org] de la part de Edgar Gabriel [gabr...@cs.uh.edu] Date d'envoi : 17 mars 2008 15:59 À : Open MPI Users Objet : Re: [OMPI users] RE : MPI_Comm_connect() fails already working on it, together with a move_request.... Thanks Edgar Jeff Squyres wrote: > Edgar -- > > Can you make a patch for the 1.2 series? > > On Mar 17, 2008, at 3:45 PM, Edgar Gabriel wrote: > >> Martin, >> >> I found the problem in the inter-allgather, and fixed it in patch >> 17849. >> The same test using however MPI_Intercomm_create (just to simplify my >> life compared to Connect/Accept) using 2 vs 4 processes in the two >> groups passes for me -- and did fail with the previous version. >> >> >> Thanks >> Edgar >> >> >> Audet, Martin wrote: >>> Hi Jeff, >>> >>> As I said in my last message (see bellow) the patch (or at least >>> the patch I got) don't fixes the problem for me. Whether I apply it >>> over OpenMPI 1.2.5 or 1.2.6rc2, I still get the same problem: >>> >>> The client aborts with a truncation error message while the server >>> freeze when for example the server is started on 3 process and the >>> client on 2 process. >>> >>> Feel free to try yourself the two small client and server programs >>> I posted in my first message. >>> >>> Thanks, >>> >>> Martin >>> >>> >>> Subject: [OMPI users] RE : users Digest, Vol 841, Issue 3 >>> From: Audet, Martin (Martin.Audet_at_[hidden]) >>> Date: 2008-03-13 17:04:25 >>> >>> Hi Georges, >>> >>> Thanks for your patch, but I'm not sure I got it correctly. The >>> patch I got modify a few arguments passed to isend()/irecv()/recv() >>> in coll_basic_allgather.c. Here is the patch I applied: >>> >>> Index: ompi/mca/coll/basic/coll_basic_allgather.c >>> =================================================================== >>> --- ompi/mca/coll/basic/coll_basic_allgather.c (revision 17814) >>> +++ ompi/mca/coll/basic/coll_basic_allgather.c (working copy) >>> @@ -149,7 +149,7 @@ >>> } >>> >>> /* Do a send-recv between the two root procs. to avoid >>> deadlock */ >>> - err = MCA_PML_CALL(isend(sbuf, scount, sdtype, 0, >>> + err = MCA_PML_CALL(isend(sbuf, scount, sdtype, root, >>> MCA_COLL_BASE_TAG_ALLGATHER, >>> MCA_PML_BASE_SEND_STANDARD, >>> comm, &reqs[rsize])); >>> @@ -157,7 +157,7 @@ >>> return err; >>> } >>> >>> - err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0, >>> + err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, root, >>> MCA_COLL_BASE_TAG_ALLGATHER, comm, >>> &reqs[0])); >>> if (OMPI_SUCCESS != err) { >>> @@ -186,14 +186,14 @@ >>> return err; >>> } >>> >>> - err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0, >>> + err = MCA_PML_CALL(isend(rbuf, rsize * scount, sdtype, root, >>> MCA_COLL_BASE_TAG_ALLGATHER, >>> MCA_PML_BASE_SEND_STANDARD, comm, >>> &req)); >>> if (OMPI_SUCCESS != err) { >>> goto exit; >>> } >>> >>> - err = MCA_PML_CALL(recv(tmpbuf, size * scount, sdtype, 0, >>> + err = MCA_PML_CALL(recv(tmpbuf, size * rcount, rdtype, root, >>> MCA_COLL_BASE_TAG_ALLGATHER, comm, >>> MPI_STATUS_IGNORE)); >>> if (OMPI_SUCCESS != err) { >>> >>> However with this patch, I still have the problem. Suppose I start >>> the server with three process and the client with two, the clients >>> prints: >>> >>> [audet_at_linux15 dyn_connect]$ mpiexec --universe univ1 -n 2 ./ >>> aclient '0.2.0:2000' >>> intercomm_flag = 1 >>> intercomm_remote_size = 3 >>> rem_rank_tbl[3] = { 0 1 2} >>> [linux15:26114] *** An error occurred in MPI_Allgather >>> [linux15:26114] *** on communicator >>> [linux15:26114] *** MPI_ERR_TRUNCATE: message truncated >>> [linux15:26114] *** MPI_ERRORS_ARE_FATAL (goodbye) >>> mpiexec noticed that job rank 0 with PID 26113 on node linux15 >>> exited on signal 15 (Terminated). >>> [audet_at_linux15 dyn_connect]$ >>> >>> and abort. The server on the other side simply hang (as before). >>> >>> Regards, >>> >>> Martin >>> >>> -----Original Message----- >>> From: users-boun...@open-mpi.org [mailto:users-bounces@open- >>> mpi.org] On Behalf Of Jeff Squyres >>> Sent: March 14, 2008 19:45 >>> To: Open MPI Users >>> Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails >>> >>> Yes, please let us know if this fixes it. We're working on a 1.2.6 >>> release; we can definitely put this fix in there if it's correct. >>> >>> Thanks! >>> >>> >>> On Mar 13, 2008, at 4:07 PM, George Bosilca wrote: >>> >>>> I dig into the sources and I think you correctly pinpoint the bug. >>>> It seems we have a mismatch between the local and remote sizes in >>>> the inter-communicator allgather in the 1.2 series (which explain >>>> the message truncation error when the local and remote groups have a >>>> different number of processes). Attached to this email you can find >>>> a patch that [hopefully] solve this problem. If you can please test >>>> it and let me know if this solve your problem. >>>> >>>> Thanks, >>>> george. >>>> >>>> <inter_allgather.patch> >>>> >>>> >>>> On Mar 13, 2008, at 1:11 PM, Audet, Martin wrote: >>>> >>>>> Hi, >>>>> >>>>> After re-checking the MPI standard (www.mpi-forum.org and MPI - The >>>>> Complete Reference), I'm more and more convinced that my small >>>>> examples programs establishing a intercommunicator with >>>>> MPI_Comm_Connect()/MPI_Comm_accept() over an MPI port and >>>>> exchanging data over it with MPI_Allgather() is correct. Especially >>>>> calling MPI_Allgather() with recvcount=1 (its third argument) >>>>> instead of the total number of MPI_INT that will be received (e.g. >>>>> intercomm_remote_size in the examples) is both correct and >>>>> consistent with MPI_Allgather() behavior on intracommunicator (e.g. >>>>> "normal" communicator). >>>>> >>>>> MPI_Allgather(&comm_rank, 1, MPI_INT, >>>>> rem_rank_tbl, 1, MPI_INT, >>>>> intercomm); >>>>> >>>>> Also the recvbuf argument (the second argument) of MPI_Allgather() >>>>> in the examples should have a size of intercomm_remote_size (e.g. >>>>> the size of the remote group), not the sum of the local and remote >>>>> groups in the client and sever process. The standard says that for >>>>> all-to-all type of operations over an intercommunicator, the >>>>> process send and receives data from the remote group only (anyway >>>>> it is not possible to exchange data with process of the local group >>>>> over an intercommunicator). >>>>> >>>>> So, for me there is no reason for stopping the process with an >>>>> error message complaining about message truncation. There should be >>>>> no truncation, sendcount, sendtype, recvcount and recvtype >>>>> arguments of MPI_Allgather() are correct and consistent. >>>>> >>>>> So again for me the OpenMPI behavior with my example look more and >>>>> more like a bug... >>>>> >>>>> Concerning George comment about valgrind and TCP/IP, I totally >>>>> agree, messages reported by valgrind are only a clue of a bug, >>>>> especially in this contex, not a proof of bug. Another clue is that >>>>> my small examples work perfectly with mpich2 ch3:sock. >>>>> >>>>> Regards, >>>>> >>>>> Martin Audet >>>>> >>>>> >>>>> ------------------------------ >>>>> >>>>> Message: 4 >>>>> Date: Thu, 13 Mar 2008 08:21:51 +0100 >>>>> From: jody <jody....@gmail.com> >>>>> Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails >>>>> To: "Open MPI Users" <us...@open-mpi.org> >>>>> Message-ID: >>>>> <9b0da5ce0803130021l4ead0f91qaf43e4ac7d332...@mail.gmail.com> >>>>> Content-Type: text/plain; charset=ISO-8859-1 >>>>> >>>>> HI >>>>> I think the recvcount argument you pass to MPI_Allgather should not >>>>> be >>>>> 1 but instead >>>>> the number of MPI_INTs your buffer rem_rank_tbl can contain. >>>>> As it stands now, you tell MPI_Allgather that it may only receive 1 >>>>> MPI_INT. >>>>> >>>>> Furthermore, i'm not sure, but i think your receive buffer should >>>>> be >>>>> large enough >>>>> to contain messages from *all* processes, and not just from the >>>>> "far side" >>>>> >>>>> Jody >>>>> >>>>> . >>>>> >>>>> >>>>> ------------------------------ >>>>> >>>>> Message: 6 >>>>> Date: Thu, 13 Mar 2008 09:06:47 -0500 >>>>> From: George Bosilca <bosi...@eecs.utk.edu> >>>>> Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails >>>>> To: Open MPI Users <us...@open-mpi.org> >>>>> Message-ID: <82e9ff28-fb87-4ffb-a492-dde472d5d...@eecs.utk.edu> >>>>> Content-Type: text/plain; charset="us-ascii" >>>>> >>>>> I am not aware of any problems with the allreduce/allgather. But, >>>>> we >>>>> are aware of the problem with valgrind that report non initialized >>>>> values when used with TCP. It's a long story, but I can guarantee >>>>> that >>>>> this should not affect a correct MPI application. >>>>> >>>>> george. >>>>> >>>>> PS: For those who want to know the details: we have to send a >>>>> header >>>>> over TCP which contain some very basic information, including the >>>>> size >>>>> of the fragment. Unfortunately, we have a 2 bytes gap in the >>>>> header. >>>>> As we never initialize these 2 unused bytes, but we send them over >>>>> the >>>>> wire, valgrind correctly detect the non initialized data transfer. >>>>> >>>>> >>>>> On Mar 12, 2008, at 3:58 PM, Audet, Martin wrote: >>>>> >>>>>> Hi again, >>>>>> >>>>>> Thanks Pak for the link and suggesting to start an "orted" deamon, >>>>>> by doing so my clients and servers jobs were able to establish an >>>>>> intercommunicator between them. >>>>>> >>>>>> However I modified my programs to perform an MPI_Allgather() of a >>>>>> single "int" over the new intercommunicator to test >>>>>> communication a >>>>>> litle bit and I did encountered problems. I am now wondering if >>>>>> there is a problem in MPI_Allreduce() itself for >>>>>> intercommunicators. >>>>>> Note that the same program run without problems with mpich2 >>>>>> (ch3:sock). >>>>>> >>>>>> For example if I start orted as follows: >>>>>> >>>>>> orted --persistent --seed --scope public --universe univ1 >>>>>> >>>>>> and then start the server with three process: >>>>>> >>>>>> mpiexec --universe univ1 -n 3 ./aserver >>>>>> >>>>>> it prints: >>>>>> >>>>>> Server port = '0.2.0:2000' >>>>>> >>>>>> Now if I start the client with two process as follow (using the >>>>>> server port): >>>>>> >>>>>> mpiexec --universe univ1 -n 2 ./aclient '0.2.0:2000' >>>>>> >>>>>> The server prints: >>>>>> >>>>>> intercomm_flag = 1 >>>>>> intercomm_remote_size = 2 >>>>>> rem_rank_tbl[2] = { 0 1} >>>>>> >>>>>> which is the correct output. The client then prints: >>>>>> >>>>>> intercomm_flag = 1 >>>>>> intercomm_remote_size = 3 >>>>>> rem_rank_tbl[3] = { 0 1 2} >>>>>> [linux15:30895] *** An error occurred in MPI_Allgather >>>>>> [linux15:30895] *** on communicator >>>>>> [linux15:30895] *** MPI_ERR_TRUNCATE: message truncated >>>>>> [linux15:30895] *** MPI_ERRORS_ARE_FATAL (goodbye) >>>>>> mpiexec noticed that job rank 0 with PID 30894 on node linux15 >>>>>> exited on signal 15 (Terminated). >>>>>> >>>>>> As you can see the first messages are correct but the client job >>>>>> terminate with an error (and the server hang). >>>>>> >>>>>> After re-reading the documentation about MPI_Allgather() over an >>>>>> intercommunicator, I don't see anything wrong in my simple code. >>>>>> Also if I run the client and server process with valgrind, I get a >>>>>> few messages like: >>>>>> >>>>>> ==29821== Syscall param writev(vector[...]) points to >>>>>> uninitialised >>>>>> byte(s) >>>>>> ==29821== at 0x36235C2130: writev (in /lib64/libc-2.3.5.so) >>>>>> ==29821== by 0x7885583: mca_btl_tcp_frag_send (in /home/ >>>>>> publique/ >>>>>> openmpi-1.2.5/lib/openmpi/mca_btl_tcp.so) >>>>>> ==29821== by 0x788501B: mca_btl_tcp_endpoint_send (in /home/ >>>>>> publique/openmpi-1.2.5/lib/openmpi/mca_btl_tcp.so) >>>>>> ==29821== by 0x7467947: mca_pml_ob1_send_request_start_prepare >>>>>> (in /home/publique/openmpi-1.2.5/lib/openmpi/mca_pml_ob1.so) >>>>>> ==29821== by 0x7461494: mca_pml_ob1_isend (in /home/publique/ >>>>>> openmpi-1.2.5/lib/openmpi/mca_pml_ob1.so) >>>>>> ==29821== by 0x798BF9D: mca_coll_basic_allgather_inter (in / >>>>>> home/ >>>>>> publique/openmpi-1.2.5/lib/openmpi/mca_coll_basic.so) >>>>>> ==29821== by 0x4A5069C: PMPI_Allgather (in /home/publique/ >>>>>> openmpi-1.2.5/lib/libmpi.so.0.0.0) >>>>>> ==29821== by 0x400EED: main (aserver.c:53) >>>>>> ==29821== Address 0x40d6cac is not stack'd, malloc'd or >>>>>> (recently) >>>>>> free'd >>>>>> >>>>>> in both MPI_Allgather() and MPI_Comm_disconnect() calls for client >>>>>> and server with valgrind always reporting that the address in >>>>>> question are "not stack'd, malloc'd or (recently) free'd". >>>>>> >>>>>> So is there a problem with MPI_Allgather() on intercommunicators >>>>>> or >>>>>> am I doing something wrong ? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Martin >>>>>> >>>>>> >>>>>> /* aserver.c */ >>>>>> #include <stdio.h> >>>>>> #include <mpi.h> >>>>>> >>>>>> #include <assert.h> >>>>>> #include <stdlib.h> >>>>>> >>>>>> int main(int argc, char **argv) >>>>>> { >>>>>> int comm_rank,comm_size; >>>>>> char port_name[MPI_MAX_PORT_NAME]; >>>>>> MPI_Comm intercomm; >>>>>> int ok_flag; >>>>>> >>>>>> int intercomm_flag; >>>>>> int intercomm_remote_size; >>>>>> int *rem_rank_tbl; >>>>>> int ii; >>>>>> >>>>>> MPI_Init(&argc, &argv); >>>>>> >>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank); >>>>>> MPI_Comm_size(MPI_COMM_WORLD, &comm_size); >>>>>> >>>>>> ok_flag = (comm_rank != 0) || (argc == 1); >>>>>> MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD); >>>>>> >>>>>> if (!ok_flag) { >>>>>> if (comm_rank == 0) { >>>>>> fprintf(stderr,"Usage: %s\n",argv[0]); >>>>>> } >>>>>> MPI_Abort(MPI_COMM_WORLD, 1); >>>>>> } >>>>>> >>>>>> MPI_Open_port(MPI_INFO_NULL, port_name); >>>>>> >>>>>> if (comm_rank == 0) { >>>>>> printf("Server port = '%s'\n", port_name); >>>>>> } >>>>>> MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, >>>>>> &intercomm); >>>>>> >>>>>> MPI_Close_port(port_name); >>>>>> >>>>>> MPI_Comm_test_inter(intercomm, &intercomm_flag); >>>>>> if (comm_rank == 0) { >>>>>> printf("intercomm_flag = %d\n", intercomm_flag); >>>>>> } >>>>>> assert(intercomm_flag != 0); >>>>>> MPI_Comm_remote_size(intercomm, &intercomm_remote_size); >>>>>> if (comm_rank == 0) { >>>>>> printf("intercomm_remote_size = %d\n", intercomm_remote_size); >>>>>> } >>>>>> rem_rank_tbl = >>>>>> malloc(intercomm_remote_size*sizeof(*rem_rank_tbl)); >>>>>> MPI_Allgather(&comm_rank, 1, MPI_INT, >>>>>> rem_rank_tbl, 1, MPI_INT, >>>>>> intercomm); >>>>>> if (comm_rank == 0) { >>>>>> printf("rem_rank_tbl[%d] = {", intercomm_remote_size); >>>>>> for (ii=0; ii < intercomm_remote_size; ii++) { >>>>>> printf(" %d", rem_rank_tbl[ii]); >>>>>> } >>>>>> printf("}\n"); >>>>>> } >>>>>> free(rem_rank_tbl); >>>>>> >>>>>> MPI_Comm_disconnect(&intercomm); >>>>>> >>>>>> MPI_Finalize(); >>>>>> >>>>>> return 0; >>>>>> } >>>>>> >>>>>> /* aclient.c */ >>>>>> #include <stdio.h> >>>>>> #include <unistd.h> >>>>>> >>>>>> #include <mpi.h> >>>>>> >>>>>> #include <assert.h> >>>>>> #include <stdlib.h> >>>>>> >>>>>> int main(int argc, char **argv) >>>>>> { >>>>>> int comm_rank,comm_size; >>>>>> int ok_flag; >>>>>> MPI_Comm intercomm; >>>>>> >>>>>> int intercomm_flag; >>>>>> int intercomm_remote_size; >>>>>> int *rem_rank_tbl; >>>>>> int ii; >>>>>> >>>>>> MPI_Init(&argc, &argv); >>>>>> >>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank); >>>>>> MPI_Comm_size(MPI_COMM_WORLD, &comm_size); >>>>>> >>>>>> ok_flag = (comm_rank != 0) || ((argc == 2) && argv[1] && >>>>>> (*argv[1] != '\0')); >>>>>> MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD); >>>>>> >>>>>> if (!ok_flag) { >>>>>> if (comm_rank == 0) { >>>>>> fprintf(stderr,"Usage: %s mpi_port\n", argv[0]); >>>>>> } >>>>>> MPI_Abort(MPI_COMM_WORLD, 1); >>>>>> } >>>>>> >>>>>> while (MPI_Comm_connect((comm_rank == 0) ? argv[1] : 0, >>>>>> MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm) != MPI_SUCCESS) { >>>>>> if (comm_rank == 0) { >>>>>> printf("MPI_Comm_connect() failled, sleeping and retrying... >>>>>> \n"); >>>>>> } >>>>>> sleep(1); >>>>>> } >>>>>> >>>>>> MPI_Comm_test_inter(intercomm, &intercomm_flag); >>>>>> if (comm_rank == 0) { >>>>>> printf("intercomm_flag = %d\n", intercomm_flag); >>>>>> } >>>>>> assert(intercomm_flag != 0); >>>>>> MPI_Comm_remote_size(intercomm, &intercomm_remote_size); >>>>>> if (comm_rank == 0) { >>>>>> printf("intercomm_remote_size = %d\n", intercomm_remote_size); >>>>>> } >>>>>> rem_rank_tbl = >>>>>> malloc(intercomm_remote_size*sizeof(*rem_rank_tbl)); >>>>>> MPI_Allgather(&comm_rank, 1, MPI_INT, >>>>>> rem_rank_tbl, 1, MPI_INT, >>>>>> intercomm); >>>>>> if (comm_rank == 0) { >>>>>> printf("rem_rank_tbl[%d] = {", intercomm_remote_size); >>>>>> for (ii=0; ii < intercomm_remote_size; ii++) { >>>>>> printf(" %d", rem_rank_tbl[ii]); >>>>>> } >>>>>> printf("}\n"); >>>>>> } >>>>>> free(rem_rank_tbl); >>>>>> >>>>>> MPI_Comm_disconnect(&intercomm); >>>>>> >>>>>> MPI_Finalize(); >>>>>> >>>>>> return 0; >>>>>> } >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> -------------- next part -------------- >>>>> A non-text attachment was scrubbed... >>>>> Name: smime.p7s >>>>> Type: application/pkcs7-signature >>>>> Size: 2423 bytes >>>>> Desc: not available >>>>> Url : >>>>> http://www.open-mpi.org/MailArchives/users/attachments/20080313/642d41dd/attachment.bin >>>>> >>>>> ------------------------------ >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> End of users Digest, Vol 841, Issue 1 >>>>> ************************************* >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> -- >>> Jeff Squyres >>> Cisco Systems >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> -- >> Edgar Gabriel >> Assistant Professor >> Parallel Software Technologies Lab http://pstl.cs.uh.edu >> Department of Computer Science University of Houston >> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users