[slurm-dev] GRES Overallocating resources

Carles Fenoy Thu, 30 Jun 2011 09:23:22 -0700

Hi all,

I've been testing last few days slurm with gres for our future nvidia
machine, and I'm facing some problems with gres overallocating resources.
I've seen the following error every time the controller starts a job.


[2011-06-28T14:50:55] error: gres/gpu: job 20206 node bscop134 overallocated
resources by 2

The configuration consists of 1 node with 2 gpus. At the end of the email
you can find the relevant configurations parameters.

Is this the expected behavior of the scheduling with gres? Is this a bug, or
there is no way to no over-allocate resources?

Best regards,
Carles Fenoy

slurm.conf:

SelectType=select/cons_res

SelectTypeParameters=CR_CPU

SchedulerType=sched/backfill

GresTypes=gpu

NodeName=DEFAULT RealMemory=12000 Procs=8 TmpDisk=20000 Gres=gpus:2

NodeName=bscop134 NodeAddr=bscop134 Gres=gpus:2

PartitionName=projects AllowGroups=ALL Hidden=NO RootOnly=NO
MaxNodes=UNLIMITED MinNodes=1 MaxTime=UNLIMITED Shared=NO State=UP
Default=YES Nodes=bscop134


gres.conf:

Name=gpu File=/dev/nvidia0 CPUs=0-3
Name=gpu File=/dev/nvidia1 CPUs=4-7


-- 
--
Carles Fenoy

[slurm-dev] GRES Overallocating resources

Reply via email to