Euh, I did a "make all install" so I think pmi support is installed. And the hello world program is working, would it if it wasn't installed ?
2016-04-30 18:04 GMT+01:00 Andy Riebs <[email protected]>: > For Slurm, after the "make install", did you do a "make install-contrib" > (which builds the pmi2 support)? I think you would have seen a runtime > error if you hadn't, but possibly not. > > On 04/30/2016 12:14 PM, Mehdi Acheli wrote: > > First of all, thank you for the reaction. > > Here are the answers : > > 1. I tried multiple commands: > 1. I started with "srun -N2 --mpi=pmi2 ptest" then I changed the > slurm.conf's mpi parameter to pmi2 so I no longer need the option. > 2. I also tried a script submitted via sbatch. It doesn't work > either and squeue shows that it's running. My program is just passing a > number from node 1 to node 2 so it doesn't normally take that long. > 2. OpenMPI version is 1.10.2 / SLURM's is 15.08.8 > 3. I built Slurm myself with no specific options. For OpenMPI I > actually downloaded it from the CentOS 7 default repo. But I tried building > the same version before with --with-slurm and --with-pmi options, yet it > wasn't working either.� > > I am joining a copy of my slurm.conf file and the script I used to submit > the job. > > The script :� > > >> >> >> >> >> >> >> *#!/bin/bash # #SBATCH --job-name=test #SBATCH --output=res_mpi.txt # >> #SBATCH -N 2 module load openmpi mpirun test* > > > Slurm.conf file : > > > # slurm.conf file generated by configurator easy.html. >> >> # Put this file on all nodes of your cluster. >> >> # See the slurm.conf man page for more information. >> >> # >> >> ControlMachine=m >> >> ControlAddr=m >> >> BackupController=mb >> >> BackupAddr=mb >> >> # >> >> #MailProg=/bin/mail >> >> MpiDefault=pmi2 >> >> MpiParams=ports=12000-12999 >> >> ProctrackType=proctrack/linuxproc >> >> ReturnToService=2 >> >> #SlurmctldPidFile=/var/run/slurmctld.pid >> >> #SlurmctldPort=6817 >> >> #SlurmdPidFile=/var/run/slurmd.pid >> >> #SlurmdPort=6818 >> >> SlurmdSpoolDir=/var/spool/slurm/slurmd >> >> SlurmUser=slurm >> >> #SlurmdUser=root >> >> #StateSaveLocation=/var/spool/slurm >> >> StateSaveLocation=/mnt/data/spool/slurm >> >> SwitchType=switch/none >> >> TaskPlugin=task/none >> >> # >> >> # >> >> # TIMERS >> >> #KillWait=30 >> >> #MinJobAge=300 >> >> #SlurmctldTimeout=120 >> >> #SlurmdTimeout=300 >> >> # >> >> # >> >> # SCHEDULING >> >> FastSchedule=1 >> >> SchedulerType=sched/backfill >> >> #SchedulerPort=7321 >> >> SelectType=select/linear >> >> PreemptType=preempt/partition_prio >> >> PreemptMode=requeue >> >> # >> >> # >> >> # LOGGING AND ACCOUNTING >> >> AccountingStorageType=accounting_storage/slurmdbd >> >> #JobAcctGatherFrequency=30 >> >> JobAcctGatherType=jobacct_gather/linux >> >> JobCompType=jobcomp/none >> >> #SlurmctldDebug=3 >> >> #SlurmctldLogFile=/var/log/slurmctld.log >> >> SlurmctldLogFile=/mnt/data/log/slurmctld.log >> >> #SlurmdDebug=3 >> >> SlurmdLogFile=/var/log/slurmd.log >> >> AccountingStorageBackupHost=mb >> >> # >> >> # >> >> # COMPUTE NODES >> >> NodeName=n[1-8] NodeAddr=n[1-8] CPUs=1 State=UNKNOWN >> >> NodeName=logn NodeAddr=logn CPUs=1 State=UNKNOWN >> >> > 2016-04-30 16:40 GMT+01:00 Andy Riebs <[email protected]>: > >> Hi, >> >> The one problem that I see in your description is minor, and probably not >> significant: the MPI ports parameter was needed for very old versions of >> Open MPI, IIRC. >> >> To help debug your problems, please respond to this list with >> >> 1. What command did you use to invoke your program? >> 2. What versions of Slurm and OpenMPI are you using? >> 3. Did you build them yourself, or use prebuilt versions? >> - If you built them yourself, what configuration options did you use? >> - If pre-built versions, where did you get them? >> 4. A copy of your slurm.conf file (you may want to change node names >> and other potentially sensitive information) >> >> Andy >> >> On 04/30/2016 10:02 AM, Mehdi Acheli wrote: >> >> Hello everyone, >> >> I've set a basic configuration using�slurm�with a master node, >> backup node, a login node and eight compute node. >> >> Everything in�slurm�is working fine. I can issue jobs and see >> the state of the eight nodes as Idle. The problem is with OpenMPI. The >> hello parallel program where each process prints its rank among the global >> set is working but when i try to establish communications between nodes >> through MPI_Send and MPI_Recv, it just hangs there undefinitely.� >> >> I'm using CentOS 7, firewalld and SElinux are disabled. If i launch my >> parallel program, ptest, on 2 nodes : [n1, n2], a little check with lsof -i >> shows that ptest is listening on port 1024 on both nodes, which i find >> weird since only one should be listening. Moreover, i've set slurm Mpi >> parameters on pmi2 and ports allowed on [12000-12999], so why is it still >> using port 1024 ? >> >> I hope u can help me with this problem. I can't see what's wrong.� >> Thank you in advance. >> >> M. Acheli. >> >> >> > >
