Hi, Am 03.09.2014 um 12:17 schrieb Donato Pera:
> I'm using Rocks 5.4.3 with SGE 6.1 I installed > a new version of openMPI 1.6.5 when I run > a script using SGE+openMPI (1.6.5) in a single node > I don't have any problems but when I try to use more nodes > I get this error: > > > A hostfile was provided that contains at least one node not > present in the allocation: > > hostfile: /tmp/21202.1.parallel.q/machines > node: compute-2-4 > > If you are operating in a resource-managed environment, then only > nodes that are in the allocation can be used in the hostfile. You > may find relative node syntax to be a useful alternative to > specifying absolute node names see the orte_hosts man page for > further information. Was Open MPI compiled with SGE support? $ ompi_info | grep grid MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5) In this case you don't need to provide any -machinefile option at all, as Open MPI will use the SGE generated one automatically. (Nevertheless the $TMPDIR/machines should be correct - it could be an issue between the short hostname and the FQDN - can you `cat` the $TMPDIR/machines in a job script for curiosity - and the output of `hostname` on a node too therein?). > -------------------------------------------------------------------------- > rm: cannot remove `/tmp/21202.1.parallel.q/rsh': No such file or directory > -------------------------------------------------------------------------- The above line comes from "stop_proc_args" defined in the "mpi" PE and can be ignored. In fact: you don't need any "stop_proc_args" at all. Maybe you can define a new PE solely for Open MPI, often called "orte": https://www.open-mpi.org/faq/?category=sge -- Reuti > I send also my SGE script: > > #!/bin/bash > #$ -S /bin/bash > #$ -pe mpi 64 > #$ -cwd > #$ -o ./file.out > #$ -e ./file.err > > export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH > export OMP_NUM_THREADS=1 > > CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/ > PP_PATH=/home/tanzi > > /home/SWcbbc/openmpi-1.6.5/bin/mpirun -np 64 -machinefile > $TMPDIR/machines > ${CPMD_PATH}cpmd.x input ${PP_PATH}/PP/ > out > > > I don't understand my mistake > > Regards D. > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25238.php