On Jan 29, 2006, at 6:09 PM, Brian Granger wrote:

I have compiled and installed OpenMPI on Mac OS X. As I undertstand it, I can have mpirun start jobs using either ssh/xgrid or any other system (PBS, etc.) that I have installed. How can I configure which method is used? What process does ompi/orte go through to select which method to use?

Currently I am mainly interested in ssh/xgrid at this point, but PBS soon. How do these work? From poking around it looks like there are lots of MCA parameters for the ras/pls modules that are relevant. But there is very little documentation about what they all do.

Can anyone give me pointers about where to look for more documentation?

Unfortunately (shame on me) there isn't any documentation on the XGrid support at this time. It's on my to-do list, but so are a lot of other things. I've included some notes below that should help -- if not, feel free to ask all the questions you want. It will help to know what information people expect.

Open MPI does a run-time priority ranking to determine which process starter is used. ssh/rsh has the lowest ranking, and XGrid, PBS, and SLURM all have a rating that is higher than ssh/rsh. However, the XGrid, PBS, and SLURM components all only allow themselves to be selected if some other condition is met that indicates that they should be used. For PBS and SLURM, this is the environment variables set by the batch scheduler indicating that a PBS (or SLURM) job is being executed.

The XGrid starter currently looks for a couple of environment variables to decide if it can be used. Currently, the XGrid process starter only supports the basic password authentication to the controller. As such, the two environment variables the XGrid starter looks for are XGRID_CONTROLLER_HOSTNAME and XGRID_CONTROLLER_PASSWORD. These are the same environment variables that the 'xgrid' command-line submission process uses.

The XGrid support in Open MPI is currently in a beta stage, and has a couple of limitations that might make it unappealing to you. It requires that Open MPI be installed on all the nodes, and be in the default path for user 'nobody', which pretty much means installing it in /usr. This is because it only supports password authentication (and not Kerboeros authentication), so all jobs will run as nobody. If there is interest, it would not be hard to add Kerberos authentication support. The XGridFoundation framework is only available for 32 bit PPC / x86, so the starter will only build if Open MPI is building in 32 bit mode. We currently require all Open MPI processes (run-time and application) be the same endianness and pointer size, so all user processes must be 32 bit applications. We intend on removing this restriction some time in the future, allowing a 32 bit runtime and 64 bit user application.

The restriction that Open MPI be installed on all nodes is a slightly more difficult problem. Open MPI usually builds as a shared library with a bunch of dynamically loaded shared objects, complicating the list of what must be migrated. Even if statically linked, there is still a helper process we have to migrate out with your application (to deal with standard I/O in the expected way, along with some other features that are much easier to implement with a helper daemon).

To use the XGrid system, make sure that the XGrid controller is properly configured to use password-based authentication. Then issues the following commands (assuming tcsh)

    % setenv XGRID_CONTROLLER_HOSTNAME mycomputer.apple.com
    % setenv XGRID_CONTROLLER_PASSWORD pword
    % mpirun -np X ./myapp

XGrid does not give users a way to know how many nodes are available. Open MPI assumes that if a user requested X nodes, there will eventually be X nodes available to run on. SO if X is greater than the available number of nodes, mpirun will happily submit that request to XGrid and XGrid will happily queue the job until X number of nodes are available. I wish there was a better way to handle that situation, but there doesn't seem to be. I've talked a little bit with the XGrid developers about improving this. Since XGrid is intended to be used in environments where machines come and go at will, it can be difficult to determine how many agents are up and running -- that isn't a static answer. I think at one point there was talk of adding a flag to the job submission that would bounce the job out of the queue if some period of time (possibly including immediately) passed without the job being queued. I don't know if anything ever came of that discussion.

There is really only one MCA parameter that users should ever have to adjust for the XGrid starter. The MCA parameter "pls_xgrid_job_delete" defaults to 1 and if it is non-zero, jobs will be removed from the list of executed jobs that have completed (the XGrid controller maintains this list). If jobs aren't deleted by Open MPI at completion, their results will remain in the XGrid contoller's data store until the user manually deletes them.

As for the rsh/ssh component, there are a couple of MCA parameters that might be of use to most users.

pls_rsh_num_concurrent: Open MPI tries to fork off this number of rsh/ssh instances before waiting for some to complete to move on. This number
     defaults to 128.  On platforms with low per-user process or file
descriptor counts, this may have to be slightly lower. On really large
     machines, it's possible start-up performance would increase by
     increasing this number
pls_rsh_assume_same_shell: Open MPI will assume the same shell is used on the remote nodes as on the current node (ie, they are all tcsh, bash, ksh, etc.) if this is non-zero. Otherwise, we must log in to each node twice, the first time to determine which shell is used on the remote
     nodes.
plsh_rsh_agent: a colon (:) separated list of startup agents to attempt
     to use.  Open MPI will use the first one available on the starting
     node.  If a starter is available but doesn't work, an error will
result. The default value is 'ssh : rsh', meaning that ssh will be
     used unless it isn't installed, in which case rsh will be used.

Please let me know if you have more questions.

Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/


Reply via email to