[slurm-users] Understanding gres binding

Wiegand, Paul Wed, 09 May 2018 13:33:23 -0700

Greetings,

I am setting up our new GPU cluster and trying to ensure that a user may issue 
a request such that all the cores assigned to them are on the same socket to 
which the GPU is bound; however, I guess I do not fully understand the settings 
because I seem to be getting cores from multiple sockets when I expect not to.  
I am sure that I'm doing something wrong.


I have specified which cores are assigned to which GPUs in the gres.conf file, 
and I'm including the "--gres-flags=enforce-binding" flag; however, when I look 
at the CPU set in the assigned cgroup, the CPUs in my cgroup appear to overlap 
both sockets.

What am I misunderstanding?  More detail below.

Thanks,
Paul.

---

(evuser1:/home/pwiegand) scontrol version
slurm 17.11.0

(evuser1:/home/pwiegand) cat /etc/slurm/gres.conf 
## Configure support for two GPUs
NodeName=evc[1-10] Name=gpu File=/dev/nvidia0 
COREs=0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NodeName=evc[1-10] Name=gpu File=/dev/nvidia1 
COREs=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31

(evuser1:/home/pwiegand) sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*        up   infinite     10   idle evc[1-10]

(evuser1:/home/pwiegand) srun -N1 -n16 --gres=gpu:1   --time=1:00:00  
--gres-flags=enforce-binding  --pty bash

(evc1:/home/pwiegand) squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
                50    normal     bash pwiegand  R       0:48      1 evc1

(evc1:/home/pwiegand) cat  
/sys/fs/cgroup/cpuset/slurm/uid_REDACTED/job_50/cpuset.cpus 
0-1,4-5,8-9,12-13,16-17,20-21,24-25,28-29

(evc1:/home/pwiegand) scontrol show job 50
JobId=50 JobName=bash
   UserId=pwiegand(REDACTED) GroupId=pwiegand(REDACTED) MCS_label=N/A
   Priority=287 Nice=0 Account=pwiegand QOS=pwiegand
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:52 TimeLimit=01:00:00 TimeMin=N/A
   SubmitTime=2018-04-26T07:52:52 EligibleTime=2018-04-26T07:52:52
   StartTime=2018-04-26T07:52:52 EndTime=2018-04-26T08:52:52 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2018-04-26T07:52:52
   Partition=normal AllocNode:Sid=evmgnt1:32595
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=evc1
   BatchHost=evc1
   NumNodes=1 NumCPUs=16 NumTasks=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=16,mem=94400M,node=1,billing=18,gres/gpu=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=5900M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=gpu:1 Reservation=(null)
   OverSubscribe=YES Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/lustre/fs0/home/pwiegand
   Power=
   GresEnforceBind=Yes

(evc1:/home/pwiegand) scontrol show node evc1
NodeName=evc1 Arch=x86_64 CoresPerSocket=16
   CPUAlloc=16 CPUErr=0 CPUTot=32 CPULoad=0.01
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=gpu:2
   NodeAddr=ivc1 NodeHostName=evc1 
   OS=Linux 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 
   RealMemory=191917 AllocMem=94400 FreeMem=189117 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=normal,preemptable 
   BootTime=2018-04-21T13:47:46 SlurmdStartTime=2018-04-21T14:02:14
   CfgTRES=cpu=32,mem=191917M,billing=36,gres/gpu=2
   AllocTRES=cpu=16,mem=94400M,gres/gpu=1
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   
(evc1:/home/pwiegand) cat /proc/cpuinfo | grep p[rh][o[y] | grep -v virtual | 
tr '\n' ',' | sed s/"processor"/"\nprocessor"/g 

processor       : 0,physical id : 0,
processor       : 1,physical id : 1,
processor       : 2,physical id : 0,
processor       : 3,physical id : 1,
processor       : 4,physical id : 0,
processor       : 5,physical id : 1,
processor       : 6,physical id : 0,
processor       : 7,physical id : 1,
processor       : 8,physical id : 0,
processor       : 9,physical id : 1,
processor       : 10,physical id        : 0,
processor       : 11,physical id        : 1,
processor       : 12,physical id        : 0,
processor       : 13,physical id        : 1,
processor       : 14,physical id        : 0,
processor       : 15,physical id        : 1,
processor       : 16,physical id        : 0,
processor       : 17,physical id        : 1,
processor       : 18,physical id        : 0,
processor       : 19,physical id        : 1,
processor       : 20,physical id        : 0,
processor       : 21,physical id        : 1,
processor       : 22,physical id        : 0,
processor       : 23,physical id        : 1,
processor       : 24,physical id        : 0,
processor       : 25,physical id        : 1,
processor       : 26,physical id        : 0,
processor       : 27,physical id        : 1,
processor       : 28,physical id        : 0,
processor       : 29,physical id        : 1,
processor       : 30,physical id        : 0,
processor       : 31,physical id        : 1,

(evc1:/home/pwiegand) grep cgroup /etc/slurm/slurm.conf
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup

(evc1:/home/pwiegand) cat /etc/slurm/cgroup.conf 
ConstrainCores=yes
ConstrainRAMSpace=yes
## RPW:  When I turn this on, srun locks up every time
##ConstrainDevices=yes
ConstrainDevices=no
CgroupAutomount=yes



--- From  "man srun" :

       --gres-flags=enforce-binding
              If set, the only CPUs available to the job will be those bound to 
the selected GRES (i.e. the CPUs identified in the gres.conf file will be 
strictly enforced rather than advisory). This option may
              result  in  delayed initiation of a job.  For example a job 
requiring two GPUs and one CPU will be delayed until both GPUs on a single 
socket are available rather than using GPUs bound to separate
              sockets, however the application performance may be improved due 
to improved communication speed.  Requires the node to be configured with more 
than one socket and resource filtering will be  per‐
              formed on a per-socket basis.  This option applies to job 
allocations.

[slurm-users] Understanding gres binding

Reply via email to