Hello,

We have slurm 15.08.8 running and seemed to have run into an issue with GPUs and the CPUs that they are suppose to have access to.

Our nodes are 2 socket with each socket having 4 GPU devices. We would like to limit which CPUs each GPU device has access to. According to the gres.conf page, this can be accomplished by using the CPUs directive.

Relevant config options:

slurm.conf
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
GresTypes=gpu
NodeName=xxx[1-8] Gres=gpu:k80:8 RealMemory=96640 Sockets=2 CoresPerSocket=8 
State=UNKNOWN

gres.conf
# had to list one device file per line or it wouldn't work at all
NodeName=xxx[1-8] Name=gpu Type=k80  File=/dev/nvidia0 CPUs=0,2,4,6,8,10,12,14
NodeName=xxx[1-8] Name=gpu Type=k80  File=/dev/nvidia1 CPUs=0,2,4,6,8,10,12,14
NodeName=xxx[1-8] Name=gpu Type=k80  File=/dev/nvidia2 CPUs=0,2,4,6,8,10,12,14
NodeName=xxx[1-8] Name=gpu Type=k80  File=/dev/nvidia3 CPUs=0,2,4,6,8,10,12,14
NodeName=xxx[1-8] Name=gpu Type=k80  File=/dev/nvidia4 CPUs=1,3,5,7,9,11,13,15
NodeName=xxx[1-8] Name=gpu Type=k80  File=/dev/nvidia5 CPUs=1,3,5,7,9,11,13,15
NodeName=xxx[1-8] Name=gpu Type=k80  File=/dev/nvidia6 CPUs=1,3,5,7,9,11,13,15
NodeName=xxx[1-8] Name=gpu Type=k80  File=/dev/nvidia7 CPUs=1,3,5,7,9,11,13,15


This post also seems to report the same issue:
https://groups.google.com/forum/#!topic/slurm-devel/2u4NXpQa_qE

Has anyone got this to work successfully? Or is something wrong with my configs?

thanks
-k

Reply via email to