Il 11/02/2016 12:25, Michael Senizaiz ha scritto:

> This doesn't enforce keeping the jobs on a K80.  There are only 4 K80's
> in the system.  If I submit a 1 gpu job and a 2 gpu job after the first
> will get GPU0 (0 and 1 are a K80, 2 and 3 are a K80, etc).  The 2gpu job
> will then get GPU 1 and GPU 2.  Then the user will complain that their
> peer-to-peer code isn't working and the job performance is bad because
> they are running across two discreet K80's and not the 2 GPU's on a
> single K80.
Like allocating multithread jobs across different hosts.

>     gres.conf
>     NodeName=node[001-008] Name=gpu Type=k80 File=/dev/nvidia[0-7] CPUs=0-19
Shouldn't you have
--8<--
NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[0-1]
NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[2-3]
NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[4-5]
NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[6-7]
--8<--
?
(I omitted CPUs since I don't know if in your case they're significant
or not)
IIUC, you should define each K80 as a different resource. But I started
with SLURM about a week ago, so I could be way off target!
HiH

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

Reply via email to