On Jun 20, 2008, at 3:50 PM, Joshua Bernstein wrote:

No, we don't have an easy way to show which plugins were loaded and may/will be used during the run. The modules you found below in -- display-map are only a few of the plugins (all dealing with the run- time environment, and only used on the back-end nodes, so it may not be what you're looking for -- e.g., it doesn't show the plugins used by mpirun).
What do you need to know?

Well basically I want to know what MTA's are being used to startup a job.

MTA?

I'm confused as to what the difference is between "used by mpirun" versus user on the back-end nodes. Doesn't --display-map show which MTA modules will used to start the backend processes?

Yes. But OMPI's run-time design usually has mpirun load one plugin of a given type, and then have the MPI processes load another plugin of the same type. For example, for I/O forwarding - mpirun will load the "svc" plugin, while MPI processes will load the "proxy" plugin. In this case, mpirun is actually providing all the smarts for I/O forwarding, and all the MPI processes simply proxy requests up to mpirun. This is a common model throughout our run-time support, for example.

The overarching issue is that I'm attempting to just begin testing my build and when I attempt to startup a job, it just hangs:

[ats@nt147 ~]$ mpirun --mca pls rsh -np 1 ./cpi
[nt147.penguincomputing.com:04640] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 247

The same thing happens if I just disable the bjs RAS MTA, since bjs, really isn't used with Scyld anymore:

[ats@nt147 ~]$ mpirun --mca ras ^bjs --mca pls rsh -np 1 ./cpi
<hang>

I know very, very little about the bproc support in OMPI -- I know that it evolved over time and is disappearing in v1.3 due to lack of interest. If you want it to stay, I think you've missed the v1.3 boat (we're in feature freeze for v1.3), but possibilities exist for future versions if you're willing to get involved in Open MPI.

The interesting thing here is that orted starts up, but I'm not sure what is supposed to happen next:

[root@nt147 ~]# ps -auxwww | grep orte
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/ procps-3.2.3/FAQ ats 4647 0.0 0.0 48204 2136 ? Ss 12:45 0:00 orted --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename nt147.penguincomputing.com --universe a...@nt147.penguincomputing.com:default-universe-4645 --nsreplica "0.0.0;tcp://192.168.5.211:59110;tcp://10.10.10.1:59110;tcp:// 10.11.10.1:59110" --gprreplica "0.0.0;tcp:// 192.168.5.211:59110;tcp://10.10.10.1:59110;tcp://10.11.10.1:59110" -- set-sid

I'm not sure that just asking for the rsh pls is the Right thing to do -- I'll have to defer to Ralph on this one...

Can you successfully run non-MPI apps, like hostname?

Finally, it should be noted that the upcoming release of Scyld will now include OpenMPI. This notion is how all of this got started.


Great! It sounds like you need to get involved, though, to preserve bproc support going forward. LANL was the only proponent of bproc- like support; they have been moving away from bproc-like clusters, however, and so support faded. We made the decision to axe bproc support in v1.3 because there was no one to maintain it. :-(

--
Jeff Squyres
Cisco Systems

Reply via email to