Hi,
 
 The one problem that I see in your description is minor, and
 probably not significant: the MPI ports parameter was needed for
 very old versions of Open MPI, IIRC.
 
 To help debug your problems, please respond to this list with
 
   [*]What command did you use to invoke your program?
     [*]What versions of Slurm and OpenMPI are you using?
   [*]Did you build them yourself, or use prebuilt versions?
     
       [*]If you built them yourself, what configuration options did
         you use?
         [*]If pre-built versions, where did you get them?
   [*]A copy of your slurm.conf file (you may want to change node
     names and other potentially sensitive information)
 Andy
 On 04/30/2016 10:02 AM, Mehdi Acheli
   wrote:
   MPI/OpenMPI send receive not working
   
   Hello everyone,
     I've set a basic
         configuration using�slurm�with a master node, backup node, a
         login node and eight compute node.
     Everything in�slurm�is working fine. I can issue jobs
         and see the state of the eight nodes as Idle. The problem is
         with OpenMPI. The hello parallel program where each process
         prints its rank among the global set is working but when i
         try to establish communications between nodes through
         MPI_Send and MPI_Recv, it just hangs there undefinitely.�
     I'm using CentOS 7,
         firewalld and SElinux are disabled. If i launch my parallel
         program, ptest, on 2 nodes : [n1, n2], a little check with
         lsof -i shows that ptest is listening on port 1024 on both
         nodes, which i find weird since only one should be
         listening. Moreover, i've set slurm Mpi parameters on pmi2
         and ports allowed on [12000-12999], so why is it still using
         port 1024 ?
     I hope u can help me with
         this problem. I can't see what's wrong.�
     Thank you in advance.
     
         M. Acheli.

Reply via email to