Hello,
We have slurm 15.08.8 running and seemed to have run into an issue with
GPUs and the CPUs that they are suppose to have access to.
Our nodes are 2 socket with each socket having 4 GPU devices. We would
like to limit which CPUs each GPU device has access to. According
to the gres.conf page, this can be accomplished by using the CPUs
directive.
Relevant config options:
slurm.conf
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
GresTypes=gpu
NodeName=xxx[1-8] Gres=gpu:k80:8 RealMemory=96640 Sockets=2 CoresPerSocket=8
State=UNKNOWN
gres.conf
# had to list one device file per line or it wouldn't work at all
NodeName=xxx[1-8] Name=gpu Type=k80 File=/dev/nvidia0 CPUs=0,2,4,6,8,10,12,14
NodeName=xxx[1-8] Name=gpu Type=k80 File=/dev/nvidia1 CPUs=0,2,4,6,8,10,12,14
NodeName=xxx[1-8] Name=gpu Type=k80 File=/dev/nvidia2 CPUs=0,2,4,6,8,10,12,14
NodeName=xxx[1-8] Name=gpu Type=k80 File=/dev/nvidia3 CPUs=0,2,4,6,8,10,12,14
NodeName=xxx[1-8] Name=gpu Type=k80 File=/dev/nvidia4 CPUs=1,3,5,7,9,11,13,15
NodeName=xxx[1-8] Name=gpu Type=k80 File=/dev/nvidia5 CPUs=1,3,5,7,9,11,13,15
NodeName=xxx[1-8] Name=gpu Type=k80 File=/dev/nvidia6 CPUs=1,3,5,7,9,11,13,15
NodeName=xxx[1-8] Name=gpu Type=k80 File=/dev/nvidia7 CPUs=1,3,5,7,9,11,13,15
This post also seems to report the same issue:
https://groups.google.com/forum/#!topic/slurm-devel/2u4NXpQa_qE
Has anyone got this to work successfully? Or is something wrong with my
configs?
thanks
-k