On Jun 20, 2008, at 3:50 PM, Joshua Bernstein wrote:
No, we don't have an easy way to show which plugins were loaded and
may/will be used during the run. The modules you found below in --
display-map are only a few of the plugins (all dealing with the run-
time environment, and only used on the back-end nodes, so it may
not be what you're looking for -- e.g., it doesn't show the plugins
used by mpirun).
What do you need to know?
Well basically I want to know what MTA's are being used to startup a
job.
MTA?
I'm confused as to what the difference is between "used by mpirun"
versus user on the back-end nodes. Doesn't --display-map show which
MTA modules will used to start the backend processes?
Yes. But OMPI's run-time design usually has mpirun load one plugin of
a given type, and then have the MPI processes load another plugin of
the same type. For example, for I/O forwarding - mpirun will load the
"svc" plugin, while MPI processes will load the "proxy" plugin. In
this case, mpirun is actually providing all the smarts for I/O
forwarding, and all the MPI processes simply proxy requests up to
mpirun. This is a common model throughout our run-time support, for
example.
The overarching issue is that I'm attempting to just begin testing
my build and when I attempt to startup a job, it just hangs:
[ats@nt147 ~]$ mpirun --mca pls rsh -np 1 ./cpi
[nt147.penguincomputing.com:04640] [0,0,0] ORTE_ERROR_LOG: Not
available in file ras_bjs.c at line 247
The same thing happens if I just disable the bjs RAS MTA, since bjs,
really isn't used with Scyld anymore:
[ats@nt147 ~]$ mpirun --mca ras ^bjs --mca pls rsh -np 1 ./cpi
<hang>
I know very, very little about the bproc support in OMPI -- I know
that it evolved over time and is disappearing in v1.3 due to lack of
interest. If you want it to stay, I think you've missed the v1.3 boat
(we're in feature freeze for v1.3), but possibilities exist for future
versions if you're willing to get involved in Open MPI.
The interesting thing here is that orted starts up, but I'm not sure
what is supposed to happen next:
[root@nt147 ~]# ps -auxwww | grep orte
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/
procps-3.2.3/FAQ
ats 4647 0.0 0.0 48204 2136 ? Ss 12:45 0:00 orted
--bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename
nt147.penguincomputing.com --universe
a...@nt147.penguincomputing.com:default-universe-4645 --nsreplica
"0.0.0;tcp://192.168.5.211:59110;tcp://10.10.10.1:59110;tcp://
10.11.10.1:59110" --gprreplica "0.0.0;tcp://
192.168.5.211:59110;tcp://10.10.10.1:59110;tcp://10.11.10.1:59110" --
set-sid
I'm not sure that just asking for the rsh pls is the Right thing to do
-- I'll have to defer to Ralph on this one...
Can you successfully run non-MPI apps, like hostname?
Finally, it should be noted that the upcoming release of Scyld will
now include OpenMPI. This notion is how all of this got started.
Great! It sounds like you need to get involved, though, to preserve
bproc support going forward. LANL was the only proponent of bproc-
like support; they have been moving away from bproc-like clusters,
however, and so support faded. We made the decision to axe bproc
support in v1.3 because there was no one to maintain it. :-(
--
Jeff Squyres
Cisco Systems