On Wed, 2007-07-18 at 13:44 -0400, Tim Prins wrote:
> Adam C Powell IV wrote:
> > As mentioned, I'm running in a chroot environment, so rsh and ssh won't
> > work: "rsh localhost" will rsh into the primary local host environment,
> > not the chroot, which will fail.
> > 
> > [The purpose is to be able to build and test MPI programs in the Debian
> > unstable distribution, without upgrading the whole machine to unstable.
> > Though most machines I use for this purpose run Debian stable or
> > testing, the machine I'm currently using runs a very old Fedora, for
> > which I don't think OpenMPI is available.]
> 
> Allright, I understand what you are trying to do now. To be honest, I 
> don't think we have ever really thought about this use case. We always 
> figured that to test Open MPI people would simply install it in a 
> different directory and use it from there.
> 
> > With MPICH, mpirun -np 1 just runs the new process in the current
> > context, without rsh/ssh, so it works in a chroot.  Does OpenMPI not
> > support this functionality?
> 
> Open MPI does support this functionality. First, a bit of explanation:
> 
> We use 'pls' (process launching system) components to handling the 
> launching of processes. There are components for slurm, gridengine, rsh, 
> and others. At runtime we open each of these components and query them 
> as to whether they can be used. The original error you posted says that 
> none of the 'pls' components can be used because all of they detected 
> they could not run in your setup. The slurm one excluded itself because 
> there were no environment variables set indicating it is running under 
> SLURM. Similarly, the gridengine pls said it cannot run as well. The 
> 'rsh' pls said it cannot run because neither 'ssh' nor 'rsh' are 
> available (I assume this is the case, though you did not explicitly say 
> they were not available).
> 
> But in this case, you do want the 'rsh' pls to be used. It will 
> automatically fork any local processes, and will user rsh/ssh to launch 
> any remote processes. Again, I don't think we ever imagined the use case 
>   on a UNIX-like system where there are no launchers like SLURM 
> available, and rsh/ssh also wasn't available (Open MPI is, after all, 
> primarily concerned with multi-node operation).
> 
> So, there are several ways around this:
> 
> 1. Make rsh or ssh available, even though they will not be used.
> 
> 2. Tell the 'rsh' pls component to use a dummy program such as 
> /bin/false by adding the following to the command line:
> -mca pls_rsh_agent /bin/false
> 
> 3. Create a dummy 'rsh' executable that is available in your path.
> 
> For instance:
> 
> [tprins@odin ~]$ which ssh
> /usr/bin/which: no ssh in 
> (/u/tprins/usr/ompia/bin:/u/tprins/usr/bin:/usr/local/bin:/bin:/usr/X11R6/bin)
> [tprins@odin ~]$ which rsh
> /usr/bin/which: no rsh in 
> (/u/tprins/usr/ompia/bin:/u/tprins/usr/bin:/usr/local/bin:/bin:/usr/X11R6/bin)
> [tprins@odin ~]$ mpirun -np 1  hostname
> [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file 
> runtime/orte_init_stage1.c at line 317
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>    orte_pls_base_select failed
>    --> Returned value Error (-1) instead of ORTE_SUCCESS
> 
> --------------------------------------------------------------------------
> [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file 
> runtime/orte_system_init.c at line 46
> [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file 
> runtime/orte_init.c at line 52
> [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file 
> orterun.c at line 399
> 
> [tprins@odin ~]$ mpirun -np 1 -mca pls_rsh_agent /bin/false  hostname
> odin.cs.indiana.edu
> 
> [tprins@odin ~]$ touch usr/bin/rsh
> [tprins@odin ~]$ chmod +x usr/bin/rsh
> [tprins@odin ~]$ mpirun -np 1  hostname
> odin.cs.indiana.edu
> [tprins@odin ~]$
> 
> 
> I hope this helps,
> 
> Tim

Yes, this helps tremendously.  I installed rsh, and now it pretty much
works.

The one missing detail is that I can't seem to get the stdout/stderr
output.  For example:

$ orterun -np 1 uptime
$ uptime
18:24:27 up 13 days,  3:03,  0 users,  load average: 0.00, 0.03, 0.00

The man page indicates that stdout/stderr is supposed to come back to
the stdout/stderr of the orterun process.  Any ideas on why this isn't
working?

Thank you again!

-Adam
-- 
GPG fingerprint: D54D 1AEE B11C CE9B A02B  C5DD 526F 01E8 564E E4B6

Welcome to the best software in the world today cafe!
http://www.take6.com/albums/greatesthits.html

Reply via email to