On Aug 3, 2008, at 1:35 PM, Mark Borgerding wrote:
First of all, my simple question:
In what files can I find the source code for "mca_oob.oob_send" and
"mca_oob.oob_recv"? I'm having a hard time following the
initialization code that populates the struct of callbacks.
We actually only have one "oob" component that uses TCP
communications. We have long thought of writing others (e.g., a
native OOB for OpenFabrics kinds of networks), but never really gotten
around to it. So those function pointers point to the various
functions in orte/mca/oob/tcp/oob_tcp.c. On the OMPI SVN trunk, the
module struct starts at line 136; it's those functions in particular.
Next, the context of the question:
I've been trying to find a way to make a plain old process start and
then participate in an MPI Group spread across a cluster. Let me
try to use the local dialect and express my goal in terms I am
likely to misuse: I want to make a singleton MPI process spawn and
establish an intercommunicator with another MPI world.
Here's the list of things that have not worked:
Using MPI_Comm_spawn -- I've been told this is working in the 1.3
cvs snapshots, but not in any stable release.
The symptom is that the call to MPI_Comm_spawn complains about not
having a hostfile. For the full history, see ompi-users thread "How
to specify hosts for MPI_Comm_spawn" for details.
If you could verify that they do work for you on OMPI SVN trunk
nightly tarballs, that would be most helpful.
Forking the parent process *before* it enters any MPI calls ( to
hopefully avoid environmental pitfalls Jeff Squyres warned of).
Parent process calls MPI_Init to become the MPI singleton, then
tries to establish an intercommunicator with the MPI group that is
getting spawned at the same time.
Just FYI, a minor terminology correction: the MPI processes that are
spawned have a common MPI communicator. A communicator is an MPI
group + a unique communication context. For example, two different
communicators can share the same group, but will always have different
communication contexts. So what you send on communicator A will never
be received on communicator B, even if the source and destination
processes are the same. My point: although the phrase has no
definition specified by the MPI spec, we usually say "MPI job" to mean
a bunch of MPI processes that share a common MPI_COMM_WORLD. So it's
[usually] more natural to say "...the spawned MPI job..."
Forked child processes overlays the process of mpirun via execlp
to start a "normal" MPI group. I've tried two different methods for
establishing the intercomm. Both methods hang indefinitely and use
lots of cpu doing nothing.
Fork Method 1: MPI_Open_port+ MPI_Comm_accept on one side,
MPI_Comm_connect on the other.
The two sides hang in the MPI_Comm_accept and MPI_Comm_connect. I
did not pursue it deeper than that.
Weird.
Fork Method 2: tcp socket establishment, followed by MPI_Comm_join
on both sides.
Both sides hang in the call to MPI_Comm_join. Upon further
inspection and code-hacking, I've determined they can successfully
trade names "0.0.0" and "0.1.0" and both sides then call
ompi_comm_connect_accept. Inside omp_comm_connect_accept, both
sides call orte_rml.send_buffer; one side finishes the call, while
the other gets blocked inside oob_send.
The side that did not get blocked moves on to call
orte_rml.recv_buffer . It gets blocked inside oob_recv.
I think that Ralph can shed light on this one -- we may not have good
support for COMM_JOIN in the v1.2 series without a persistent
orted...? It's a global process naming issue, IIRC.
OOB == Out of band sockets? If so, why?
OOB is OMPI's out-of-band mechanism. We use it for bootstrapping and
other information exchange between MPI processes (e.g., the
information exchange during MPI_INIT and MPI_FINALIZE). It's not a
public API, and we change it between releases. I wouldn't recommend
using it in general MPI applications; it does not exist in other MPI
implementations.
--
Jeff Squyres
Cisco Systems