Hmmm….I haven’t seen someone using OMPI 1.6.0 in a very long time. Please note 
that the latest OMPI release is now 1.10.0, so your installation is rather far 
behind.

At the very least, I would start by updating OMPI to the 1.8.8 or 1.10.0 level. 
You will then find that the SLURM integration has improved quite a bit, and you 
no longer need to use the —resv-ports option. OMPI will run with the standard 
PMI library.

You will also find that mpirun will respect the SLURM-assigned task affinity.

You may also want to update SLURM, but I leave that to others to advise - the 
OMPI change by itself should resolve the problem.


> On Aug 27, 2015, at 2:26 AM, Turner, Andrew <[email protected]> wrote:
> 
> Dear all
>  
> We are running SLURM version 2.3 and openmpi 1.6.0.
> In order to run openmpi jobs and inherit the correct task affinity from 
> SLURM, jobs are executed with 'srun --resv-ports ./the_job' under sbatch (or 
> salloc).
>  
> Pure mpi tasks with --cpus-per-task=1 run fine.
> The issue is when attempting a hybrid mpi-omp task, with --cpus-per-task > 1, 
>  the job fails when using 'srun --resv-ports'.
> Many error messages are printed, along the lines of
> ' ORTE_ERROR_LOG: Not found in file ess_slurmd_module.c at line 504'
>  
> I am not the administrator of the cluster, only a user, but I was hoping we 
> might be able to point the administrators in a useful direction to solve the 
> issue.
> Is this a known issue?  E.g. due to some incompatibility between this SLURM 
> version and the OpenMPI we have installed?  Would updating SLURM and/or 
> OpenMPI solve this issue?  Or could it be a configuration issue that is 
> easily fixed?  (see config file below)
>  
> As a side issue, maybe related, we find that
> - We can run multiple threads per task if we execute using mpirun (e.g. 
> mpirun -bind-to-socket -bysocket), but mpirun does not know anything about 
> what cores it has been allocated, so it only works with exclusive node 
> option.  On shared nodes it will often crash.
> - We don’t use mpirun for pure MPI jobs since we find tasks do not have the 
> correct task affinity/binding (in this case, no binding).  Hence we use 
> ‘srun’ since nodes are shared.
> - With srun we must use ‘--resv-ports’.  Without resv-ports results in the 
> error message:
>   orte_grpcomm_modex failed
>   --> Returned "A message is attempting to be sent to a process whose contact 
> information is unknown" (-117) instead of "Success" (0)
>  
> Hopefully someone can advise how we can make it work for multiple threaded 
> jobs?  Thanks in advance.
>  
> Andy
>  
>  
> Andrew Turner
> Culham Centre for Fusion Energy
> Culham Science Centre
> Abingdon
> Oxfordshire
> OX14 3DB
>  
> www.ccfe.ac.uk <http://www.ccfe.ac.uk/>
>  
> Our slurm.conf file
>  
> ClusterName=erik
> ControlMachine=erik000
> BackupController=erik001
> SlurmUser=slurm
> SlurmctldPort=6817
> SlurmdPort=6818
> AuthType=auth/munge
> StateSaveLocation=/home/sysadmin/SlurmState
> SlurmdSpoolDir=/tmp/slurmd
> SwitchType=switch/none
> MpiDefault=none
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmdPidFile=/var/run/slurmd.pid
> Proctracktype=proctrack/linuxproc
> CacheGroups=0
> ReturnToService=1
> TaskPlugin=task/affinity
> # TIMERS
> SlurmctldTimeout=300
> SlurmdTimeout=300
> InactiveLimit=0
> MinJobAge=300
> KillWait=30
> Waittime=0
> #
> # SCHEDULING
> SchedulerType=sched/wiki
> SchedulerPort=7321
> SelectType=select/cons_res
> FastSchedule=1
> # LOGGING
> SlurmctldDebug=3
> SlurmctldLogFile=/var/log/slurmctld.log
> SlurmdDebug=3
> SlurmdLogFile=/var/log/slurmd.log
> JobCompType=jobcomp/filetxt
> JobCompLoc=/var/slurm/accounting
> #
> # ACCOUNTING
> JobAcctGatherType=jobacct_gather/linux
> JobAcctGatherFrequency=30
> #
> AccountingStorageType=accounting_storage/filetxt
> #
> # MPI
> MpiParams=ports=12000-12999
> #
> # COMPUTE NODES
> NodeName=erik000 Procs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 
> State=UNKNOWN
> NodeName=DEFAULT Procs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 
> RealMemory=129009 State=UNKNOWN
> NodeName=erik[001-044]
> PartitionName=erik Nodes=erik[001-044] Default=YES MaxTime=INFINITE State=UP

Reply via email to