Il 11/02/2016 12:25, Michael Senizaiz ha scritto: > This doesn't enforce keeping the jobs on a K80. There are only 4 K80's > in the system. If I submit a 1 gpu job and a 2 gpu job after the first > will get GPU0 (0 and 1 are a K80, 2 and 3 are a K80, etc). The 2gpu job > will then get GPU 1 and GPU 2. Then the user will complain that their > peer-to-peer code isn't working and the job performance is bad because > they are running across two discreet K80's and not the 2 GPU's on a > single K80. Like allocating multithread jobs across different hosts.
> gres.conf > NodeName=node[001-008] Name=gpu Type=k80 File=/dev/nvidia[0-7] CPUs=0-19 Shouldn't you have --8<-- NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[0-1] NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[2-3] NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[4-5] NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[6-7] --8<-- ? (I omitted CPUs since I don't know if in your case they're significant or not) IIUC, you should define each K80 as a different resource. But I started with SLURM about a week ago, so I could be way off target! HiH -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it