Hello everyone,

I've set a basic configuration using slurm with a master node, backup node,
a login node and eight compute node.

Everything in slurm is working fine. I can issue jobs and see the state of
the eight nodes as Idle. The problem is with OpenMPI. The hello parallel
program where each process prints its rank among the global set is working
but when i try to establish communications between nodes through MPI_Send
and MPI_Recv, it just hangs there undefinitely.

I'm using CentOS 7, firewalld and SElinux are disabled. If i launch my
parallel program, ptest, on 2 nodes : [n1, n2], a little check with lsof -i
shows that ptest is listening on port 1024 on both nodes, which i find
weird since only one should be listening. Moreover, i've set slurm Mpi
parameters on pmi2 and ports allowed on [12000-12999], so why is it still
using port 1024 ?

I hope u can help me with this problem. I can't see what's wrong.
Thank you in advance.

M. Acheli.

Reply via email to