[slurm-dev] Re: One CPU always reserved for one GPU

Redouane Bouchouirbat Fri, 04 Mar 2016 03:58:49 -0800

Hi,
I have the same problem, i can't ask for one gpu:

srun -p gpunpart --gres=gpu:1 --pty bash -i
srun: error: Unable to allocate resources: Requested node configuration is
not available


I configured gpu nodes in slurm.conf like that :
...
*NodeName=nodgpu[01-05]  Procs=24 CoresPerSocket=12 RealMemory=128000
Sockets=2 ThreadsPerCore=1 TmpDisk=703488 Gres=gpu:4
Feature=Haswell,Tesla,k40m*
...

*GresTypes=Haswell,Tesla,Westmere,gpu,k40m*

and



*SelectType=select/cons_resSelectTypeParameters=CR_Socket_Memory*...

the gres.conf file on the five nodes:





*Name=gpu File=/dev/nvidia0  CPUs=0,2,4,6,8,10,12,14,16,18,20,22Name=gpu
File=/dev/nvidia1  CPUs=1,3,5,7,9,11,13,15,17,19,21,23Name=gpu
File=/dev/nvidia2  CPUs=0,2,4,6,8,10,12,14,16,18,20,22Name=gpu
File=/dev/nvidia3  CPUs=1,3,5,7,9,11,13,15,17,19,21,23Name=mic Count=0*

The cgroup.conf on each node:







*CgroupMountpoint="/sys/fs/cgroup"CgroupAutomount=yesCgroupReleaseAgentDir="/etc/slurm/cgroup"ConstrainRAMSpace=yesAllowedRAMSpace=100ConstrainCores=yesTaskAffinity=no*

we use slurm/14.11.11

I don't know what'is the problem
Any idea ?
Thank you in advance

Red

2016-03-03 23:14 GMT+01:00 Lachele Foley <[email protected]>:

>
> We aren't doing that.  I agree we probably should.  If you work out a
> config, and don't mind doing so, please share it.
>
>
> On Thu, Mar 3, 2016 at 3:11 AM, Daniel Letai <[email protected]> wrote:
> > Correct me if I'm wrong, but I don't see any NUMA based reservation of
> the
> > CPUs - Do you ensure that each reserved cpu is from a different socket,
> and
> > GPU jobs affinity is to correct NUMA node?
> >
> >
> > On 03/02/2016 12:30 AM, Lachele Foley wrote:
> >
> > We do exactly that.  We use the CPUs as the consumable resource rather
> > than the GPUs for that reason.  We also limit memory use as needed.
> > You might want to see the configuration issues we ran into and solved
> > as recorded in the thread at the link below.
> >
> > https://groups.google.com/forum/#!topic/slurm-devel/x6VaKfrdH5Y
> >
> >
> > On Tue, Mar 1, 2016 at 1:27 PM, John Desantis <[email protected]>
> wrote:
> >
> > Felix,
> >
> > Although I haven't run into a use-case like yours (yet), my initial
> > thought was to use the flag "MaxCPUsPerNode" in your configuration:
> >
> > 'Maximum number of CPUs on any node available to all jobs from this
> > partition.  This can be especially useful to schedule GPUs. For
> > example  a  node can  be  associated  with  two Slurm partitions (e.g.
> > "cpu" and "gpu") and the partition/queue "cpu" could be limited to
> > only a subset of the node’s CPUs, insuring that one or more CPUs would
> > be available to jobs in the "gpu" partition/queue.'
> >
> > HTH,
> > John DeSantis
> >
> >
> >
> > 2016-03-01 9:05 GMT-05:00 Felix Willenborg
> > <[email protected]>:
> >
> > Hey folks,
> >
> > I'm kind of new to SLURM and we're setting it up in our work group with
> our
> > nodes. Our cluster contains per node 2 GPUs and 12 CPU cores.
> >
> > The GPUs are configured with gres like this :
> > Name=gpu_mem Count=6143
> > Name=gpu File=/dev/nvidia0
> > Name=gpu File=/dev/nvidia1
> > #Name=bandwidth count=4G
> > (Somehow the bandwith plugin isn't available in the repository slurm and
> I'm
> > getting error messages with that. That's why it's commented out. Is it
> even
> > necessary?)
> >
> > The nodes are defined like that in the slurm.conf :
> > [...]
> > NodeName=node01 NodeAddr=<...> CPUs=12 RealMemory=128740 Sockets=2
> > CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
> > Gres=gpu:3,gpu_mem:12287#,bandwidth:4G
> >
> >
> > We'd like to have a situation where one CPU is always available for one
> GPU
> > and only can allocated with one GPU, because we often had the situation
> that
> > reservations were made where all CPUs were allocated and we couldn't use
> the
> > GPUs anymore. I searched on the internet and didn't find any similiar
> cases
> > which could help me. The only thing I found was adding "CPUS=0,1" at the
> end
> > of every Name=gpu ... in gres.conf. Would this already do it? And if not,
> > what can I do? I've got the feeling that I could solve my problem with
> SLURM
> > in many ways. We're using SLURM version 14.11.8.
> >
> > Looking forward to some answers!
> >
> > Best wishes,
> > Felix Willenborg
> >
> >
> >
>
>
>
> --
> :-) Lachele
> Lachele Foley
> CCRC/UGA
> Athens, GA USA

[slurm-dev] Re: One CPU always reserved for one GPU

Reply via email to