Hi,

I need to launch my openmpi application on grid.

My application is designed to run N processes, where each process would have M 
threads.

To run it without grid, I run it as (say N = 7, M = 2)
% mpirun -np 7 <application name with arguments>

The above works well and runs N processes. I am also able to submit it on grid 
using below command and it works.

% qsub -pe orte 7 -l os-redhat6.7* -V -j y -b y -shell no mpirun -np 7 
<application name with arguments>

However, the above job allocates only N slots on the grid, when it really is 
consuming N*M slots. How do I submit the qsub command so that it reserves the 
N*M slots, while starting up N processes? I tried belwo but I get some weird 
error from ORTE as pasted below.

% qsub -pe orte 14 -l os-redhat6.7* -V -j y -b y -shell no mpirun -np 7 
<application name with arguments>

Any ideas?

Thanks,
Vipul


--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------
--------------------------------------------------------------------------
ORTE does not know how to route a message to the specified daemon
located on the indicated node:

  my node:   mach12
  target node:  mach24

This is usually an internal programming error that should be
reported to the developers. In the meantime, a workaround may
be to set the MCA param routed=direct on the command line or
in your environment. We apologize for the problem.

Reply via email to