Hi all, I've been testing last few days slurm with gres for our future nvidia machine, and I'm facing some problems with gres overallocating resources. I've seen the following error every time the controller starts a job.
[2011-06-28T14:50:55] error: gres/gpu: job 20206 node bscop134 overallocated resources by 2 The configuration consists of 1 node with 2 gpus. At the end of the email you can find the relevant configurations parameters. Is this the expected behavior of the scheduling with gres? Is this a bug, or there is no way to no over-allocate resources? Best regards, Carles Fenoy slurm.conf: SelectType=select/cons_res SelectTypeParameters=CR_CPU SchedulerType=sched/backfill GresTypes=gpu NodeName=DEFAULT RealMemory=12000 Procs=8 TmpDisk=20000 Gres=gpus:2 NodeName=bscop134 NodeAddr=bscop134 Gres=gpus:2 PartitionName=projects AllowGroups=ALL Hidden=NO RootOnly=NO MaxNodes=UNLIMITED MinNodes=1 MaxTime=UNLIMITED Shared=NO State=UP Default=YES Nodes=bscop134 gres.conf: Name=gpu File=/dev/nvidia0 CPUs=0-3 Name=gpu File=/dev/nvidia1 CPUs=4-7 -- -- Carles Fenoy
