Hi Joshua Again, forwarded by the friendly elf - so include me directly in any reply.
I gather from Jeff that you are attempting to do something with bproc - true? If so, I will echo what Jeff said: bproc support in OMPI is being dropped with the 1.3 release due to lack of interest/support. Just a "heads up". If you are operating in a bproc environment, then I'm not sure why you are specifying that the system use the rsh launcher. Bproc requires some very special handling which is only present in the bproc launcher. You can run both MPI and non-MPI apps with it, but bproc is weird and so OMPI some -very- different logic in it to make it all work. I suspect the problem you are having is that all of the frameworks are detecting bproc and trying to run accordingly. This means that the orted is executing process startup procedures for bproc - which are totally different than for any other environment (e.g., rsh). If mpirun is attempting to execute an rsh launch, and the orted is expecting a bproc launch, then I can guarantee that no processes will be launched and you will hang. I'm not sure there is a way in 1.2 to tell the orteds to ignore the fact that they see bproc and do something else. I can look, but would rather wait to hear if that is truly what you are trying to do, and why. Ralph On 6/21/08 5:10 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > On Jun 20, 2008, at 3:50 PM, Joshua Bernstein wrote: > >>> No, we don't have an easy way to show which plugins were loaded and >>> may/will be used during the run. The modules you found below in -- >>> display-map are only a few of the plugins (all dealing with the run- >>> time environment, and only used on the back-end nodes, so it may >>> not be what you're looking for -- e.g., it doesn't show the plugins >>> used by mpirun). >>> What do you need to know? >> >> Well basically I want to know what MTA's are being used to startup a >> job. > > MTA? > >> I'm confused as to what the difference is between "used by mpirun" >> versus user on the back-end nodes. Doesn't --display-map show which >> MTA modules will used to start the backend processes? > > Yes. But OMPI's run-time design usually has mpirun load one plugin of > a given type, and then have the MPI processes load another plugin of > the same type. For example, for I/O forwarding - mpirun will load the > "svc" plugin, while MPI processes will load the "proxy" plugin. In > this case, mpirun is actually providing all the smarts for I/O > forwarding, and all the MPI processes simply proxy requests up to > mpirun. This is a common model throughout our run-time support, for > example. > >> The overarching issue is that I'm attempting to just begin testing >> my build and when I attempt to startup a job, it just hangs: >> >> [ats@nt147 ~]$ mpirun --mca pls rsh -np 1 ./cpi >> [nt147.penguincomputing.com:04640] [0,0,0] ORTE_ERROR_LOG: Not >> available in file ras_bjs.c at line 247 >> >> The same thing happens if I just disable the bjs RAS MTA, since bjs, >> really isn't used with Scyld anymore: >> >> [ats@nt147 ~]$ mpirun --mca ras ^bjs --mca pls rsh -np 1 ./cpi >> <hang> > > I know very, very little about the bproc support in OMPI -- I know > that it evolved over time and is disappearing in v1.3 due to lack of > interest. If you want it to stay, I think you've missed the v1.3 boat > (we're in feature freeze for v1.3), but possibilities exist for future > versions if you're willing to get involved in Open MPI. > >> The interesting thing here is that orted starts up, but I'm not sure >> what is supposed to happen next: >> >> [root@nt147 ~]# ps -auxwww | grep orte >> Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/ >> procps-3.2.3/FAQ >> ats 4647 0.0 0.0 48204 2136 ? Ss 12:45 0:00 orted >> --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename >> nt147.penguincomputing.com --universe >> a...@nt147.penguincomputing.com:default-universe-4645 --nsreplica >> "0.0.0;tcp://192.168.5.211:59110;tcp://10.10.10.1:59110;tcp:// >> 10.11.10.1:59110" --gprreplica "0.0.0;tcp:// >> 192.168.5.211:59110;tcp://10.10.10.1:59110;tcp://10.11.10.1:59110" -- >> set-sid > > I'm not sure that just asking for the rsh pls is the Right thing to do > -- I'll have to defer to Ralph on this one... > > Can you successfully run non-MPI apps, like hostname? > >> Finally, it should be noted that the upcoming release of Scyld will >> now include OpenMPI. This notion is how all of this got started. > > > Great! It sounds like you need to get involved, though, to preserve > bproc support going forward. LANL was the only proponent of bproc- > like support; they have been moving away from bproc-like clusters, > however, and so support faded. We made the decision to axe bproc > support in v1.3 because there was no one to maintain it. :-(