On Aug 3, 2008, at 1:35 PM, Mark Borgerding wrote:

First of all, my simple question:
In what files can I find the source code for "mca_oob.oob_send" and "mca_oob.oob_recv"? I'm having a hard time following the initialization code that populates the struct of callbacks.

We actually only have one "oob" component that uses TCP communications. We have long thought of writing others (e.g., a native OOB for OpenFabrics kinds of networks), but never really gotten around to it. So those function pointers point to the various functions in orte/mca/oob/tcp/oob_tcp.c. On the OMPI SVN trunk, the module struct starts at line 136; it's those functions in particular.

Next, the context of the question:
I've been trying to find a way to make a plain old process start and then participate in an MPI Group spread across a cluster. Let me try to use the local dialect and express my goal in terms I am likely to misuse: I want to make a singleton MPI process spawn and establish an intercommunicator with another MPI world.

Here's the list of things that have not worked:

Using MPI_Comm_spawn -- I've been told this is working in the 1.3 cvs snapshots, but not in any stable release. The symptom is that the call to MPI_Comm_spawn complains about not having a hostfile. For the full history, see ompi-users thread "How to specify hosts for MPI_Comm_spawn" for details.

If you could verify that they do work for you on OMPI SVN trunk nightly tarballs, that would be most helpful.

Forking the parent process *before* it enters any MPI calls ( to hopefully avoid environmental pitfalls Jeff Squyres warned of). Parent process calls MPI_Init to become the MPI singleton, then tries to establish an intercommunicator with the MPI group that is getting spawned at the same time.

Just FYI, a minor terminology correction: the MPI processes that are spawned have a common MPI communicator. A communicator is an MPI group + a unique communication context. For example, two different communicators can share the same group, but will always have different communication contexts. So what you send on communicator A will never be received on communicator B, even if the source and destination processes are the same. My point: although the phrase has no definition specified by the MPI spec, we usually say "MPI job" to mean a bunch of MPI processes that share a common MPI_COMM_WORLD. So it's [usually] more natural to say "...the spawned MPI job..."

Forked child processes overlays the process of mpirun via execlp to start a "normal" MPI group. I've tried two different methods for establishing the intercomm. Both methods hang indefinitely and use lots of cpu doing nothing. Fork Method 1: MPI_Open_port+ MPI_Comm_accept on one side, MPI_Comm_connect on the other. The two sides hang in the MPI_Comm_accept and MPI_Comm_connect. I did not pursue it deeper than that.

Weird.

Fork Method 2: tcp socket establishment, followed by MPI_Comm_join on both sides. Both sides hang in the call to MPI_Comm_join. Upon further inspection and code-hacking, I've determined they can successfully trade names "0.0.0" and "0.1.0" and both sides then call ompi_comm_connect_accept. Inside omp_comm_connect_accept, both sides call orte_rml.send_buffer; one side finishes the call, while the other gets blocked inside oob_send. The side that did not get blocked moves on to call orte_rml.recv_buffer . It gets blocked inside oob_recv.

I think that Ralph can shed light on this one -- we may not have good support for COMM_JOIN in the v1.2 series without a persistent orted...? It's a global process naming issue, IIRC.

OOB == Out of band sockets?  If so, why?


OOB is OMPI's out-of-band mechanism. We use it for bootstrapping and other information exchange between MPI processes (e.g., the information exchange during MPI_INIT and MPI_FINALIZE). It's not a public API, and we change it between releases. I wouldn't recommend using it in general MPI applications; it does not exist in other MPI implementations.

--
Jeff Squyres
Cisco Systems

Reply via email to