On Aug 28, 2007, at 10:59 AM, Lev Givon wrote:

Received from Brian Barrett on Tue, Aug 28, 2007 at 12:22:29PM EDT:
On Aug 27, 2007, at 3:14 PM, Lev Givon wrote:

I have OpenMPI 1.2.3 installed on an XGrid cluster and a separate Mac client that I am using to submit jobs to the head (controller) node of the cluster. The cluster's compute nodes are all connected to the head
node via a private network and are not running any firewalls. When I
try running jobs with mpirun directly on the cluster's head node, they execute successfully; if I attempt to submit the jobs from the client (which can run jobs on the cluster using the xgrid command line tool) with mpirun, however, they appear to hang indefinitely (i.e., a job ID is created, but the mpirun itself never returns or terminates). Is it
nececessary to configure the firewall on the submission client to
grant access to the cluster head node in order to remotely submit jobs
to the cluster's head node?

Currently, every node on which an MPI process is launched must be
able to open a connection to a random port on the machine running
mpirun.  So in your case, you'd have to configure the network on the
cluster to be able to connect back to your workstation (and the
workstation would have to allow connections from all your cluster
nodes). Far from ideal, but it's what it is.

Brian

Can this be avoided by submitting the "mpirun -n 10 myProg" command
directly to the controller node with the xgrid command line tool? For
some reason, sending the above command to the cluster results in a
"task: failed with status 255" error even though I can successfully
run other programs or commands to the cluster with the xgrid tool.  I
know that OpenMPI on the cluster is running properly because I can run
programs with mpirun successfully when logged into the controller node
itself.

Open MPI was designed to be the one calling XGrid's scheduling algorithm, so I'm pretty sure that you can't submit a job that just runs Open MPI's mpirun. That wasn't really in our original design space as an option.

Brian

Reply via email to