Re: [OMPI devel] Fwd: MPI_INPLACE problem
Ah, I see the problem now. I am working on a fix; many thanks for the report! (Further updates will be on http://svn.open-mpi.org/trac/ompi/ticket/430) On 9/27/06 10:24 AM, "Lisandro Dalcin"wrote: > Here an example of the problems I have with MPI_INPLACE in OMPI. > Hoping this can be useful. Perhaps the problem is not in OMPI sources, > but in my particular build. I've configured with: > > $ head -n 7 config.log | tail -n 1 > $ ./configure --disable-dlopen --prefix /usr/local/openmpi/1.1.1 > > First I present a very simple program that gives right results with > OMPI, next a small modification changing the sendcount argument, wich > gives now wrong results. > > Using MPICH2, both versions give the same, right result. > > My environment: > -- > > $ echo $PATH > /usr/local/openmpi/1.1.1/bin:/usr/kerberos/bin:/usr/lib/ccache/bin:/usr/local/ > bin:/bin:/usr/bin:/usr/X11R6/bin:. > > $ echo $LD_LIBRARY_PATH > /usr/local/openmpi/1.1.1/lib:/usr/local/openmpi/1.1.1/lib/openmpi > > First test program > - > > This stupid program gathers the values of comm.rank at a root process > with rank = com.size/2 and prints the gathered values. > > $ cat gather.c > #include > #include > > int main() { > int size, rank, root; > MPI_Init(NULL, NULL); > MPI_Comm_size(MPI_COMM_WORLD, ); > MPI_Comm_rank(MPI_COMM_WORLD, ); > root = size/2; > if (rank == root) { > int i; > int *buf = (int *) malloc(size * sizeof(int)); > for (i=0; i buf[rank] = rank; > MPI_Gather(MPI_IN_PLACE, 1, MPI_DATATYPE_NULL, >buf, 1, MPI_INT, >root, MPI_COMM_WORLD); > for (i=0; i printf("\n"); > free(buf); > } else { > MPI_Gather(, 1, MPI_INT, >NULL, 0, MPI_DATATYPE_NULL, >root, MPI_COMM_WORLD); > } > MPI_Finalize(); > } > > Run results: > --- > $ mpicc gather.c > $ mpirun -n 5 a.out > 0,1,2,3,4, > > > Second test program > --- > > I only modify the sendcount argument at root process, which is the one > passing sendbuf=MPI_INPLACE. > > $ cat gather.c > #include > #include > > int main() { > int size, rank, root; > MPI_Init(NULL, NULL); > MPI_Comm_size(MPI_COMM_WORLD, ); > MPI_Comm_rank(MPI_COMM_WORLD, ); > root = size/2; > if (rank == root) { > int i; > int *buf = (int *) malloc(size * sizeof(int)); > for (i=0; i buf[rank] = rank; > MPI_Gather(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, >buf, 1, MPI_INT, >root, MPI_COMM_WORLD); > for (i=0; i printf("\n"); > free(buf); > } else { > MPI_Gather(, 1, MPI_INT, >NULL, 0, MPI_DATATYPE_NULL, >root, MPI_COMM_WORLD); > } > MPI_Finalize(); > } > > Run results: > --- > $ mpicc gather.c > $ mpirun -n 5 a.out > -1,-1,2,-1,-1, -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
[OMPI devel] ORTE Tutorial Materials
Hello all The materials for Thursday's session of the ORTE tutorial are now complete and stable. I have posted them on the OpenRTE web site at: http://www.open-rte.org/papers/tutorial-sept-2006/index.php Both Powerpoint and PDF (printed two slides/page) formats are available. I should have the materials for Friday on the web site later today - I may not totally complete them until Thurs night (sigh). The broadcast will be done using a shared desktop approach that involves a virtual whiteboard. I haven't used this before, but I'm hoping it will all work satisfactorily. For those non-LANL folks attending in person, I remind you that cell phones and computers (unless specifically pre-approved by the Lab) are not allowed in the building. I will attempt to have hardcopies of the materials for you to use (can't promise just yet). Cheers Ralph
Re: [OMPI devel] some possible bugs
Lisandro, do you have an example for the extended collective operations tests which fail? It would help track down the problem. I had a quick look at our implementation but I can not find an obvious problem, so an example would be extremely helpful. Thanks Edgar - Some extended collective communications failed (not by raising errors, but instead aborting tracing to stdout) when using intercommunicators. Sometimes, the problems appeared when size(local_group) != size(remote_group). However, MPI_Barrier and MPI_Bcast worked well. I still could not get the reasons for those failures. I've found a similar problem in MPICH2 when configured with error-cheking enabled (they had a bug in some error-cheking macros, I reported this issue and next they told me I was right).
Re: [OMPI devel] btl_openib_max_btls
I was using the v1.2 branch. Gleb's fix has resolved the problem. Thanks --Nysal On 9/25/06, Jeff Squyreswrote: What version of Open MPI are you using? We had a bug with this on the trunk and [unreleased] v1.2 branch; it was just fixed within the last few hours in both places. It should not be a problem in the released v1.1 series. Can you confirm that you were using the OMPI trunk or the v1.2 branch? If you're seeing this in the v1.1 series, then we need to look at this a bit closer... On 9/22/06 1:25 PM, "Nysal Jan" wrote: > The ompi_info command shows the following description for > "btl_openib_max_btls" parameter > MCA btl: parameter "btl_openib_max_btls" (current value: "-1") Maximum > number of HCA ports to use (-1 = use all available, otherwise must be >= 1) > > Even though I specify "mpirun --mca btl_openib_max_btls 1 ." 2 openib > btls are created(the HCA has 2 ports). > When I try to run Open MPI across 2 nodes (one node has an HCA with 2 ports > and the other has only one port). Both endpoints send the QP information > over to the peer. Only one endpoint exists at the peer so it prints the > following error message: > [0,1,1][btl_openib_endpoint.c:706:mca_btl_openib_endpoint_recv] can't find > suitable endpoint for this peer > > [0,1,0][btl_openib_endpoint.c:913:mca_btl_openib_endpoint_connect] error > posting receive errno says Operation now in progress > > [0,1,0][btl_openib_endpoint.c:737:mca_btl_openib_endpoint_recv] endpoint > connect error: -1 > > Is "btl_openib_max_btls" the maximum number of BTLs or maximum number of > BTLs per port (which is what the current implementation "init_one_hca()" > looks like)? > > -Nysal > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Server Virtualization Business Unit Cisco Systems