[slurm-dev] Re: Dividing a 2cpu+2gpu machine into two independent blocks of 1cpu+1gpu each.

Daniel Letai Thu, 17 Sep 2015 08:52:53 -0700

Basically build a socket based scheduling (using sockets instead ofcores), and build a gres configuration for the GPUs, 2 lines - 1 withCPUs=0-9, the other with CPUs=10-19


see http://slurm.schedmd.com/gres.html
and http://slurm.schedmd.com/slurm.conf.html (search for CR_Socket)

I'm not sure if cr_socket means the cpus in gres.conf are stillcores/threads (as implied by documentation) or sockets (as implied bycommon sense).



On 09/14/2015 07:38 PM, David McGiven wrote:

Dividing a 2cpu+2gpu machine into two independent blocks of 1cpu+1gpueach.
Dear SLURM users,
We recently bought some machines with 2 Intel Xeon processors (10 coreeach) and 2 GPU each.
For 90% of our cluster use, our jobs run well in up to 10 cores + 1GPU, and for optimal performance all the cores requested must be"pinned" to the same physical cpu.
Currently we are using a TORQUE+MAUI combination but I’m not sure ithas the features we need.
I would like to know if one could deploy a setup in SLURM like this :
Basically, I would like to divide each machine in two blocks of 1 cpu(10 cores) and 1 GPU. So the user can ask SLURM for 1 or 2 blocks,each block consisting on 10 cores and 1 GPU. If the user requests only1 block, under no circumstances the job threads can be spread to thetwo physical cpus or gpus.
For simplicity, there's no need for spreading jobs across nodes withMPI or the like. All the jobs run locally on each server.
So a cluster of 10 of these machines will have 20 usable "blocks",therefore 20 jobs maximum running simultaneously in the whole cluster.When issuing a job, users would request up to 2 "blocks" and up to 10cores and 1 gpu for each block.
I don't know if I'm overcomplicating this but this should be the idealscenario, or at least something very similar. I would prefer not touse cgroups for this since it can complicate the setup. Ideally itwould be done only with SLURM.
Three examples : The user would ask the SLURM server :


- I need 1 block, and inside this block, 8 cores and 1 gpu.
The 2nd block of the node will remain free and totally independentfrom the 1st one. SLURM would report 1 block free with 10 cores and 1gpu free (although there are 12 free in the machine) and 1 gpu.Practically, it isn't important if it reports 12 core free as long asthe user can effectively run only on the 10 cores of the 2nd cpu sincethere’s only 1 block free.
- I need 1 block, and inside this block, 10 cores and no gpu
Same as before, the 2nd block will remain free and totally independentfrom the 1st one, and new jobs could use only the 2nd cpu (10 cores)and only the 2nd GPU.
- I need 2 blocks, and inside these blocks, 14 cores and 2 gpu.
The jobs will have access to the 2 cpus+2gpus. In this case themachine won't accept new jobs because the 2 blocks are used. It can orcannot list the 6 free cores free, this is irrelevant, but since thereare no free "blocks", the slurm node won't accept more jobs.
Any suggestions or advice would be really appreciated.


Best regards,

D

[slurm-dev] Re: Dividing a 2cpu+2gpu machine into two independent blocks of 1cpu+1gpu each.

Reply via email to