FYI, the Nvidia K80 contains two K40 GPUs and they appear to the OS as two
separate GPUs, notwithstanding they are peers on the same board that can
communicate very quickly with each other.

What users want for their jobs that request 2 GPUs is the ability to
schedule the two GPUs within a single K80 so the job's application can take
advantage of the fast peer-to-peer communication.

Gary D. Brown
Adaptive Computing


On Thu, Feb 11, 2016 at 7:31 AM, Diego Zuccato <[email protected]>
wrote:

>
> Il 11/02/2016 12:25, Michael Senizaiz ha scritto:
>
> > This doesn't enforce keeping the jobs on a K80.  There are only 4 K80's
> > in the system.  If I submit a 1 gpu job and a 2 gpu job after the first
> > will get GPU0 (0 and 1 are a K80, 2 and 3 are a K80, etc).  The 2gpu job
> > will then get GPU 1 and GPU 2.  Then the user will complain that their
> > peer-to-peer code isn't working and the job performance is bad because
> > they are running across two discreet K80's and not the 2 GPU's on a
> > single K80.
> Like allocating multithread jobs across different hosts.
>
> >     gres.conf
> >     NodeName=node[001-008] Name=gpu Type=k80 File=/dev/nvidia[0-7]
> CPUs=0-19
> Shouldn't you have
> --8<--
> NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[0-1]
> NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[2-3]
> NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[4-5]
> NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[6-7]
> --8<--
> ?
> (I omitted CPUs since I don't know if in your case they're significant
> or not)
> IIUC, you should define each K80 as a different resource. But I started
> with SLURM about a week ago, so I could be way off target!
> HiH
>
> --
> Diego Zuccato
> Servizi Informatici
> Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
> mail: [email protected]
>

Reply via email to