Wow,

Seems like I've fallen behind in replying. I'll try to be sure to make sure I answer everbody's questions about what I am trying to accomplish.

Jeff Squyres wrote:
On Jun 20, 2008, at 3:50 PM, Joshua Bernstein wrote:

No, we don't have an easy way to show which plugins were loaded and may/will be used during the run. The modules you found below in --display-map are only a few of the plugins (all dealing with the run-time environment, and only used on the back-end nodes, so it may not be what you're looking for -- e.g., it doesn't show the plugins used by mpirun).
What do you need to know?

Well basically I want to know what MTA's are being used to startup a job.

MTA?

Sorry, I should have said MCA....

I'm confused as to what the difference is between "used by mpirun" versus user on the back-end nodes. Doesn't --display-map show which MTA modules will used to start the backend processes?

Yes. But OMPI's run-time design usually has mpirun load one plugin of a given type, and then have the MPI processes load another plugin of the same type. For example, for I/O forwarding - mpirun will load the "svc" plugin, while MPI processes will load the "proxy" plugin. In this case, mpirun is actually providing all the smarts for I/O forwarding, and all the MPI processes simply proxy requests up to mpirun. This is a common model throughout our run-time support, for example.

Ah, okay. So then --display-map will show what modules the backend processes are using, not MPIRUN itself.

The overarching issue is that I'm attempting to just begin testing my build and when I attempt to startup a job, it just hangs:

[ats@nt147 ~]$ mpirun --mca pls rsh -np 1 ./cpi
[nt147.penguincomputing.com:04640] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 247

The same thing happens if I just disable the bjs RAS MTA, since bjs, really isn't used with Scyld anymore:

[ats@nt147 ~]$ mpirun --mca ras ^bjs --mca pls rsh -np 1 ./cpi
<hang>

I know very, very little about the bproc support in OMPI -- I know that it evolved over time and is disappearing in v1.3 due to lack of interest. If you want it to stay, I think you've missed the v1.3 boat (we're in feature freeze for v1.3), but possibilities exist for future versions if you're willing to get involved in Open MPI.

Bummer! I would absolutely support, (along with Penguin) further contributions and development of BProc support.

Note, though that BProc Scyld, and LANL BProc, have long ago forked. We believe our BProc functionality has been developed beyond what was running at LANL, (for example we have support for threads...). I understand it it probably too late to add BProc in for 1.3, but perhaps for subsequent releases, combined with contributions from Penguin, BProc support could be resurrected in some capacity.

The interesting thing here is that orted starts up, but I'm not sure what is supposed to happen next:

[root@nt147 ~]# ps -auxwww | grep orte
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.3/FAQ ats 4647 0.0 0.0 48204 2136 ? Ss 12:45 0:00 orted --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename nt147.penguincomputing.com --universe a...@nt147.penguincomputing.com:default-universe-4645 --nsreplica "0.0.0;tcp://192.168.5.211:59110;tcp://10.10.10.1:59110;tcp://10.11.10.1:59110" --gprreplica "0.0.0;tcp://192.168.5.211:59110;tcp://10.10.10.1:59110;tcp://10.11.10.1:59110" --set-sid

I'm not sure that just asking for the rsh pls is the Right thing to do -- I'll have to defer to Ralph on this one...
Can you successfully run non-MPI apps, like hostname?

Yes. Absoultely.

Finally, it should be noted that the upcoming release of Scyld will now include OpenMPI. This notion is how all of this got started.


Great! It sounds like you need to get involved, though, to preserve bproc support going forward. LANL was the only proponent of bproc-like support; they have been moving away from bproc-like clusters, however, and so support faded. We made the decision to axe bproc support in v1.3 because there was no one to maintain it. :-(

This is what I'm in the process of doing right now. I'd like to be able to take the existing BProc functionality and modify if needed to support our BProc. I have buy in from the higher ups around here, and I will proceed with the Membership forms likely at the "Contributer" level, considering we hope to be contributing code. Signing of the 3rd part contribution agreement shouldn't be an issue.

-Joshua Bernstein
Software Engineer
Penguin Computing

Reply via email to