( cross-posted on SO: http://stackoverflow.com/questions/14775451 )
Hi, I'm very new to OpenMpi and I'm trying tosubmit OMPI to SGE: I've installed openmpi , not in /usr/... but in /commun/data/packages/openmpi/ it was compiled with --with-sge. I've added a new PE in SGE with qconf as descibed in http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/6ml49n2c0/index.html # /commun/data/packages/openmpi/bin/ompi_info | grep gridengine MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.3) # qconf -sq all.q | grep pe_ pe_list make orte Without SGE, the program runs without any problem, using several processors. /commun/data/packages/openmpi/bin/orterun -np 20 ./a.out args Now I want to submit my program to SGE In the Open MPI FAQ, I read: # Allocate a SGE interactive job with 4 slots # from a parallel environment (PE) named 'orte' shell$ qsh -pe orte 4 but my output is: qsh -pe orte 4 Your job 84550 ("INTERACTIVE") has been submitted waiting for interactive job to be scheduled ... Could not start interactive job. I've also tried the mpirun command embedded in a script: $ cat ompi.sh #!/bin/sh /commun/data/packages/openmpi/bin/mpirun \ /path/to/a.out args but it fails $ cat ompi.sh.e84552 error: executing task of job 84552 failed: execution daemon on host "node02" didn't accept task -------------------------------------------------------------------------- A daemon (pid 18327) died unexpectedly with status 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- error: executing task of job 84552 failed: execution daemon on host "node01" didn't accept task -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. How can I fix this? Many thanks