On Aug 28, 2007, at 10:59 AM, Lev Givon wrote:
Received from Brian Barrett on Tue, Aug 28, 2007 at 12:22:29PM EDT:
On Aug 27, 2007, at 3:14 PM, Lev Givon wrote:
I have OpenMPI 1.2.3 installed on an XGrid cluster and a separate
Mac
client that I am using to submit jobs to the head (controller)
node of
the cluster. The cluster's compute nodes are all connected to the
head
node via a private network and are not running any firewalls. When I
try running jobs with mpirun directly on the cluster's head node,
they
execute successfully; if I attempt to submit the jobs from the
client
(which can run jobs on the cluster using the xgrid command line
tool)
with mpirun, however, they appear to hang indefinitely (i.e., a
job ID
is created, but the mpirun itself never returns or terminates).
Is it
nececessary to configure the firewall on the submission client to
grant access to the cluster head node in order to remotely submit
jobs
to the cluster's head node?
Currently, every node on which an MPI process is launched must be
able to open a connection to a random port on the machine running
mpirun. So in your case, you'd have to configure the network on the
cluster to be able to connect back to your workstation (and the
workstation would have to allow connections from all your cluster
nodes). Far from ideal, but it's what it is.
Brian
Can this be avoided by submitting the "mpirun -n 10 myProg" command
directly to the controller node with the xgrid command line tool? For
some reason, sending the above command to the cluster results in a
"task: failed with status 255" error even though I can successfully
run other programs or commands to the cluster with the xgrid tool. I
know that OpenMPI on the cluster is running properly because I can run
programs with mpirun successfully when logged into the controller node
itself.
Open MPI was designed to be the one calling XGrid's scheduling
algorithm, so I'm pretty sure that you can't submit a job that just
runs Open MPI's mpirun. That wasn't really in our original design
space as an option.
Brian