Hello Brian,
Thanks for the information. I have been playing with OpenMPI and
Xgrid a little this week, and hadn't had much luck. This email helps
a lot.
The XGrid starter currently looks for a couple of environment
variables to decide if it can be used. Currently, the XGrid process
starter only supports the basic password authentication to the
controller. As such, the two environment variables the XGrid starter
looks for are XGRID_CONTROLLER_HOSTNAME and
XGRID_CONTROLLER_PASSWORD. These are the same environment variables
that the 'xgrid' command-line submission process uses.
Do you mean on the client/submission machine, or the agent machines
where the applications are run?
I guess you mean the client, right?
So, I guess I have to make sure I set these environment variables,
rather than just using the -p and -h xgrid command options.
The reason I am a little confused is that I am pretty sure with our
other MPI implementations, that mpirun gets called on the
computational node after the queueing system has started the job
running. What you seem to be indicating is that mpirun replaces the
queueing system call in this case, and is issued from the submission
node.
The restriction that Open MPI be installed on all nodes is a slightly
more difficult problem. Open MPI usually builds as a shared library
with a bunch of dynamically loaded shared objects, complicating the
list of what must be migrated. Even if statically linked, there is
still a helper process we have to migrate out with your application
(to deal with standard I/O in the expected way, along with some other
features that are much easier to implement with a helper daemon).
I am happy to install OpenMPI everywhere at this point, but in the
long run, it would be great to be able to run OpenMPI/Xgrid apps
without requiring preinstalled components, even if the daemon needs
to be sent via the network.
To use the XGrid system, make sure that the XGrid controller is
properly configured to use password-based authentication. Then
issues the following commands (assuming tcsh)
% setenv XGRID_CONTROLLER_HOSTNAME mycomputer.apple.com
% setenv XGRID_CONTROLLER_PASSWORD pword
% mpirun -np X ./myapp
I am assuming this is from the client/submission machine. So mpirun
replaces the xgrid command. I guess I never need to use the xgrid
command for OpenMPI/Xgrid jobs (?)
If this is the case, my next question is, how do I supply the usual
xgrid options, such as working directory, standard input file, etc?
Or is that simply not possible?
Do I simply have to have some other way (eg ssh) to get files to/from
agent machines, like I would for a batch system like PBS?
If I can get it all working, I will write up a few instructions on my
web site, which may take the pressure of you to generate some docs.
Thanks for the info, and the Xgrid port!
Regards,
Drew
---------------------------------------------------------
Drew McCormack
www.maniacalextent.com