Re: [OMPI users] MPI adopt-a-group: a question and status report

Jeff Squyres Mon, 4 Aug 2008 12:01:57 -0400

On Aug 3, 2008, at 1:35 PM, Mark Borgerding wrote:

First of all, my simple question:
In what files can I find the source code for "mca_oob.oob_send" and"mca_oob.oob_recv"? I'm having a hard time following theinitialization code that populates the struct of callbacks.

We actually only have one "oob" component that uses TCPcommunications. We have long thought of writing others (e.g., anative OOB for OpenFabrics kinds of networks), but never really gottenaround to it. So those function pointers point to the variousfunctions in orte/mca/oob/tcp/oob_tcp.c. On the OMPI SVN trunk, themodule struct starts at line 136; it's those functions in particular.

Next, the context of the question:
I've been trying to find a way to make a plain old process start andthen participate in an MPI Group spread across a cluster. Let metry to use the local dialect and express my goal in terms I amlikely to misuse: I want to make a singleton MPI process spawn andestablish an intercommunicator with another MPI world.
Here's the list of things that have not worked:
Using MPI_Comm_spawn -- I've been told this is working in the 1.3cvs snapshots, but not in any stable release.The symptom is that the call to MPI_Comm_spawn complains about nothaving a hostfile. For the full history, see ompi-users thread "Howto specify hosts for MPI_Comm_spawn" for details.

If you could verify that they do work for you on OMPI SVN trunknightly tarballs, that would be most helpful.

Forking the parent process *before* it enters any MPI calls ( tohopefully avoid environmental pitfalls Jeff Squyres warned of).Parent process calls MPI_Init to become the MPI singleton, thentries to establish an intercommunicator with the MPI group that isgetting spawned at the same time.

Just FYI, a minor terminology correction: the MPI processes that arespawned have a common MPI communicator. A communicator is an MPIgroup + a unique communication context. For example, two differentcommunicators can share the same group, but will always have differentcommunication contexts. So what you send on communicator A will neverbe received on communicator B, even if the source and destinationprocesses are the same. My point: although the phrase has nodefinition specified by the MPI spec, we usually say "MPI job" to meana bunch of MPI processes that share a common MPI_COMM_WORLD. So it's[usually] more natural to say "...the spawned MPI job..."

Forked child processes overlays the process of mpirun via execlpto start a "normal" MPI group. I've tried two different methods forestablishing the intercomm. Both methods hang indefinitely and uselots of cpu doing nothing.Fork Method 1: MPI_Open_port+ MPI_Comm_accept on one side,MPI_Comm_connect on the other.The two sides hang in the MPI_Comm_accept and MPI_Comm_connect. Idid not pursue it deeper than that.


Weird.

Fork Method 2: tcp socket establishment, followed by MPI_Comm_joinon both sides.Both sides hang in the call to MPI_Comm_join. Upon furtherinspection and code-hacking, I've determined they can successfullytrade names "0.0.0" and "0.1.0" and both sides then callompi_comm_connect_accept. Inside omp_comm_connect_accept, bothsides call orte_rml.send_buffer; one side finishes the call, whilethe other gets blocked inside oob_send.The side that did not get blocked moves on to callorte_rml.recv_buffer . It gets blocked inside oob_recv.

I think that Ralph can shed light on this one -- we may not have goodsupport for COMM_JOIN in the v1.2 series without a persistentorted...? It's a global process naming issue, IIRC.

OOB == Out of band sockets?  If so, why?

OOB is OMPI's out-of-band mechanism. We use it for bootstrapping andother information exchange between MPI processes (e.g., theinformation exchange during MPI_INIT and MPI_FINALIZE). It's not apublic API, and we change it between releases. I wouldn't recommendusing it in general MPI applications; it does not exist in other MPIimplementations.


--
Jeff Squyres
Cisco Systems

Re: [OMPI users] MPI adopt-a-group: a question and status report

Reply via email to