Hi,

Am 03.09.2014 um 12:17 schrieb Donato Pera:

> I'm using Rocks 5.4.3 with SGE 6.1 I installed
> a new version of openMPI 1.6.5 when I run
> a script using SGE+openMPI (1.6.5) in a single node
> I don't have any problems but when I try to use more nodes
> I get this error:
> 
> 
> A hostfile was provided that contains at least one node not
> present in the allocation:
> 
>  hostfile:  /tmp/21202.1.parallel.q/machines
>  node:      compute-2-4
> 
> If you are operating in a resource-managed environment, then only
> nodes that are in the allocation can be used in the hostfile. You
> may find relative node syntax to be a useful alternative to
> specifying absolute node names see the orte_hosts man page for
> further information.

Was Open MPI compiled with SGE support?

$ ompi_info | grep grid
                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)

In this case you don't need to provide any -machinefile option at all, as Open 
MPI will use the SGE generated one automatically.

(Nevertheless the $TMPDIR/machines should be correct - it could be an issue 
between the short hostname and the FQDN - can you `cat` the $TMPDIR/machines in 
a job script for curiosity - and the output of `hostname` on a node too 
therein?).


> --------------------------------------------------------------------------
> rm: cannot remove `/tmp/21202.1.parallel.q/rsh': No such file or directory
> --------------------------------------------------------------------------

The above line comes from "stop_proc_args" defined in the "mpi" PE and can be 
ignored. In fact: you don't need any "stop_proc_args" at all. Maybe you can 
define a new PE solely for Open MPI, often called "orte":

https://www.open-mpi.org/faq/?category=sge

-- Reuti


> I send also my SGE script:
> 
> #!/bin/bash
> #$ -S /bin/bash
> #$ -pe mpi 64
> #$ -cwd
> #$ -o ./file.out
> #$ -e ./file.err
> 
> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
> export OMP_NUM_THREADS=1
> 
> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
> PP_PATH=/home/tanzi
> 
> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -np 64 -machinefile 
> $TMPDIR/machines  
> ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/ > out
> 
> 
> I don't understand my mistake
> 
> Regards D.
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25238.php

Reply via email to