Hi,
The one problem that I see in your description is minor, and
probably not significant: the MPI ports parameter was needed for
very old versions of Open MPI, IIRC.
To help debug your problems, please respond to this list with
[*]What command did you use to invoke your program?
[*]What versions of Slurm and OpenMPI are you using?
[*]Did you build them yourself, or use prebuilt versions?
[*]If you built them yourself, what configuration options did
you use?
[*]If pre-built versions, where did you get them?
[*]A copy of your slurm.conf file (you may want to change node
names and other potentially sensitive information)
Andy
On 04/30/2016 10:02 AM, Mehdi Acheli
wrote:
MPI/OpenMPI send receive not working
Hello everyone,
I've set a basic
configuration using�slurm�with a master node, backup node, a
login node and eight compute node.
Everything in�slurm�is working fine. I can issue jobs
and see the state of the eight nodes as Idle. The problem is
with OpenMPI. The hello parallel program where each process
prints its rank among the global set is working but when i
try to establish communications between nodes through
MPI_Send and MPI_Recv, it just hangs there undefinitely.�
I'm using CentOS 7,
firewalld and SElinux are disabled. If i launch my parallel
program, ptest, on 2 nodes : [n1, n2], a little check with
lsof -i shows that ptest is listening on port 1024 on both
nodes, which i find weird since only one should be
listening. Moreover, i've set slurm Mpi parameters on pmi2
and ports allowed on [12000-12999], so why is it still using
port 1024 ?
I hope u can help me with
this problem. I can't see what's wrong.�
Thank you in advance.
M. Acheli.