Hello everyone, I've set a basic configuration using slurm with a master node, backup node, a login node and eight compute node.
Everything in slurm is working fine. I can issue jobs and see the state of the eight nodes as Idle. The problem is with OpenMPI. The hello parallel program where each process prints its rank among the global set is working but when i try to establish communications between nodes through MPI_Send and MPI_Recv, it just hangs there undefinitely. I'm using CentOS 7, firewalld and SElinux are disabled. If i launch my parallel program, ptest, on 2 nodes : [n1, n2], a little check with lsof -i shows that ptest is listening on port 1024 on both nodes, which i find weird since only one should be listening. Moreover, i've set slurm Mpi parameters on pmi2 and ports allowed on [12000-12999], so why is it still using port 1024 ? I hope u can help me with this problem. I can't see what's wrong. Thank you in advance. M. Acheli.
