Re: [slurm-dev] GRES Overallocating resources

Carles Fenoy Sat, 09 Jul 2011 12:18:21 -0700

Hi Moe,

Thank you for the patch. I tried it yesterday and now it works fine.


Regards,
Carles Fenoy

On Wed, Jul 6, 2011 at 11:21 PM, <[email protected]> wrote:

> Carles,
>
> The logic to support managing generic resource topology (associating
> specific generic resources with specific CPUs on a node) was incomplete. The
> attached patch should fix the problem you have reported and will be included
> in SLURM version 2.3.0-pre7.
>
> Moe Jette
> SchedMD LLC
>
>
>  On 07/01/2011 06:08 PM, Carles Fenoy wrote:
>>
>>> ---------- Mensaje reenviado ----------
>>> De: <[email protected] <mailto:[email protected]>>
>>>
>>> Fecha: 01/07/2011 17:24
>>> Asunto: Fwd: Re: [slurm-dev] GRES Overallocating resources
>>> Para: "Carles Fenoy" <[email protected] <mailto:[email protected]>>
>>>
>>>
>>> Hi Carles,
>>>
>>> I have been able to reproduce this problems. If I include the "CPUs"
>>> field in the gres.conf file this problem occurs. It does not occur
>>> without the "CPUs" field. What are the chances of getting a SLURM
>>> support contract to fix this for you?
>>>
>>> Moe Jette
>>> SchedMD LLC
>>>
>>>
>>> ----- Forwarded message from [email protected]
>>> <mailto:[email protected]> -----
>>>
>>>   Date: Fri, 1 Jul 2011 08:37:21 +0200
>>>   From: Carles Fenoy <[email protected] <mailto:[email protected]>>
>>> Reply-To: Carles Fenoy <[email protected] <mailto:[email protected]>>
>>>
>>>  Subject: Re: [slurm-dev] GRES Overallocating resources
>>>     To: [email protected] <mailto:[email protected]>
>>>     Cc: [email protected] 
>>> <mailto:[email protected].**gov<[email protected]>
>>> >
>>>
>>>
>>>
>>> Hi Moe,
>>>
>>> Thanks for your quick reply.
>>> I've modified the configurations parameters and it still behaves the same
>>> way. I send the output of squeue, sinfo, scontrol show nodes and scontrol
>>> show jobs
>>>
>>> sinfo:
>>> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
>>> projects*    up   infinite      1  alloc bscop134
>>>
>>> squeue
>>>  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
>>>  20214  projects   sbatch   cfenoy  PD       0:00      1 (Resources)
>>>  20210  projects   sbatch   cfenoy   R       4:22      1 bscop134
>>>  20211  projects   sbatch   cfenoy   R       4:21      1 bscop134
>>>  20212  projects   sbatch   cfenoy   R       4:21      1 bscop134
>>>  20213  projects   sbatch   cfenoy   R       4:20      1 bscop134
>>>
>>> scontrol show nodes:
>>> NodeName=bscop134 Arch=x86_64 CoresPerSocket=1
>>>  CPUAlloc=4 CPUErr=0 CPUTot=8 Features=(null)
>>>  Gres=gpu:2
>>>  NodeAddr=bscop134 NodeHostName=bscop134
>>>  OS=Linux RealMemory=12036 Sockets=8
>>>  State=MIXED ThreadsPerCore=1 TmpDisk=20157 Weight=1
>>>  BootTime=2011-06-17T11:15:47 SlurmdStartTime=2011-07-01T08:**37:16
>>>  Reason=(null)
>>>
>>> scontrol show jobs(only 3 jobs):
>>> JobId=20212 Name=sbatch
>>>  UserId=cfenoy(1001) GroupId=users(100)
>>>  Priority=4294901757 Account=(null) QOS=(null) WCKey=*
>>>  JobState=RUNNING Reason=None Dependency=(null)
>>>  Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
>>>  RunTime=00:04:40 TimeLimit=UNLIMITED TimeMin=N/A
>>>  SubmitTime=2011-07-01T08:38:32 EligibleTime=2011-07-01T08:38:**32
>>>  StartTime=2011-07-01T08:38:32 EndTime=Unknown
>>>  PreemptTime=NO_VAL SuspendTime=None SecsPreSuspend=0
>>>  Partition=projects AllocNode:Sid=bscop134:13583
>>>  ReqNodeList=(null) ExcNodeList=(null)
>>>  NodeList=bscop134
>>>  BatchHost=bscop134
>>>  NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
>>>  MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>>  Features=(null) Gres=gpu:2 Reservation=(null)
>>>  Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>>  Command=(null)
>>>  WorkDir=/home/cfenoy
>>>
>>> JobId=20213 Name=sbatch
>>>  UserId=cfenoy(1001) GroupId=users(100)
>>>  Priority=4294901756 Account=(null) QOS=(null) WCKey=*
>>>  JobState=RUNNING Reason=None Dependency=(null)
>>>  Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
>>>  RunTime=00:04:39 TimeLimit=UNLIMITED TimeMin=N/A
>>>  SubmitTime=2011-07-01T08:38:33 EligibleTime=2011-07-01T08:38:**33
>>>  StartTime=2011-07-01T08:38:33 EndTime=Unknown
>>>  PreemptTime=NO_VAL SuspendTime=None SecsPreSuspend=0
>>>  Partition=projects AllocNode:Sid=bscop134:13583
>>>  ReqNodeList=(null) ExcNodeList=(null)
>>>  NodeList=bscop134
>>>  BatchHost=bscop134
>>>  NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
>>>  MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>>  Features=(null) Gres=gpu:2 Reservation=(null)
>>>  Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>>  Command=(null)
>>>  WorkDir=/home/cfenoy
>>>
>>> JobId=20214 Name=sbatch
>>>  UserId=cfenoy(1001) GroupId=users(100)
>>>  Priority=4294901755 Account=(null) QOS=(null) WCKey=*
>>>  JobState=PENDING Reason=Resources Dependency=(null)
>>>  Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
>>>  RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
>>>  SubmitTime=2011-07-01T08:38:33 EligibleTime=2011-07-01T08:38:**33
>>>  StartTime=Unknown EndTime=Unknown
>>>  PreemptTime=NO_VAL SuspendTime=None SecsPreSuspend=0
>>>  Partition=projects AllocNode:Sid=bscop134:13583
>>>  ReqNodeList=(null) ExcNodeList=(null)
>>>  NodeList=(null)
>>>  NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
>>>  MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>>  Features=(null) Gres=gpu:2 Reservation=(null)
>>>  Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>>  Command=(null)
>>>  WorkDir=/home/cfenoy
>>>
>>>
>>> On Thu, Jun 30, 2011 at 7:08 PM, <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>    It looks like there is a configuration problem. You have a gres
>>>    defined in
>>>    some places as "gpu" and in other places as "gpus", which will
>>>    result in two
>>>    separate sets of data structures. In slurm v2.3 I see that
>>>    configuration log
>>>    a bunch of errors.
>>>
>>>
>>>    Quoting Carles Fenoy <[email protected] <mailto:[email protected]>>:
>>>
>>>     Hi all,
>>>
>>>
>>>        I've been testing last few days slurm with gres for our future
>>>        nvidia
>>>        machine, and I'm facing some problems with gres overallocating
>>>        resources.
>>>        I've seen the following error every time the controller starts
>>>        a job.
>>>
>>>        [2011-06-28T14:50:55] error: gres/gpu: job 20206 node bscop134
>>>        overallocated
>>>        resources by 2
>>>
>>>        The configuration consists of 1 node with 2 gpus. At the end
>>>        of the email
>>>        you can find the relevant configurations parameters.
>>>
>>>        Is this the expected behavior of the scheduling with gres? Is
>>>        this a bug,
>>>        or
>>>        there is no way to no over-allocate resources?
>>>
>>>        Best regards,
>>>        Carles Fenoy
>>>
>>>        slurm.conf:
>>>
>>>        SelectType=select/cons_res
>>>
>>>        SelectTypeParameters=CR_CPU
>>>
>>>        SchedulerType=sched/backfill
>>>
>>>        GresTypes=gpu
>>>
>>>        NodeName=DEFAULT RealMemory=12000 Procs=8 TmpDisk=20000
>>>        Gres=gpus:2
>>>
>>>        NodeName=bscop134 NodeAddr=bscop134 Gres=gpus:2
>>>
>>>        PartitionName=projects AllowGroups=ALL Hidden=NO RootOnly=NO
>>>        MaxNodes=UNLIMITED MinNodes=1 MaxTime=UNLIMITED Shared=NO State=UP
>>>        Default=YES Nodes=bscop134
>>>
>>>
>>>        gres.conf:
>>>
>>>        Name=gpu File=/dev/nvidia0 CPUs=0-3
>>>        Name=gpu File=/dev/nvidia1 CPUs=4-7
>>>
>>>
>>>        --
>>>        --
>>>        Carles Fenoy
>>>
>>>
>>>
>>>
>>>    Moe Jette
>>>    SchedMD LLC
>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Carles Fenoy
>>>
>>>
>>> ----- End forwarded message -----
>>>
>>>
>>> Moe Jette
>>> SchedMD LLC
>>>
>>> Hi Moe,
>>>
>>> Thanks for your quick reply.
>>> I've modified the configurations parameters and it still behaves the
>>> same way. I send the output of squeue, sinfo, scontrol show nodes and
>>> scontrol show jobs
>>>
>>> sinfo:
>>> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
>>> projects*    up   infinite      1  alloc bscop134
>>>
>>> squeue
>>>  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
>>>  20214  projects   sbatch   cfenoy  PD       0:00      1 (Resources)
>>>  20210  projects   sbatch   cfenoy   R       4:22      1 bscop134
>>>  20211  projects   sbatch   cfenoy   R       4:21      1 bscop134
>>>  20212  projects   sbatch   cfenoy   R       4:21      1 bscop134
>>>  20213  projects   sbatch   cfenoy   R       4:20      1 bscop134
>>>
>>> scontrol show nodes:
>>> NodeName=bscop134 Arch=x86_64 CoresPerSocket=1
>>>   CPUAlloc=4 CPUErr=0 CPUTot=8 Features=(null)
>>>   Gres=gpu:2
>>>   NodeAddr=bscop134 NodeHostName=bscop134
>>>   OS=Linux RealMemory=12036 Sockets=8
>>>   State=MIXED ThreadsPerCore=1 TmpDisk=20157 Weight=1
>>>   BootTime=2011-06-17T11:15:47 SlurmdStartTime=2011-07-01T08:**37:16
>>>   Reason=(null)
>>>
>>> scontrol show jobs(only 3 jobs):
>>> JobId=20212 Name=sbatch
>>>   UserId=cfenoy(1001) GroupId=users(100)
>>>   Priority=4294901757 Account=(null) QOS=(null) WCKey=*
>>>   JobState=RUNNING Reason=None Dependency=(null)
>>>   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
>>>   RunTime=00:04:40 TimeLimit=UNLIMITED TimeMin=N/A
>>>   SubmitTime=2011-07-01T08:38:32 EligibleTime=2011-07-01T08:38:**32
>>>   StartTime=2011-07-01T08:38:32 EndTime=Unknown
>>>   PreemptTime=NO_VAL SuspendTime=None SecsPreSuspend=0
>>>   Partition=projects AllocNode:Sid=bscop134:13583
>>>   ReqNodeList=(null) ExcNodeList=(null)
>>>   NodeList=bscop134
>>>   BatchHost=bscop134
>>>   NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
>>>   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>>   Features=(null) Gres=gpu:2 Reservation=(null)
>>>   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>>   Command=(null)
>>>   WorkDir=/home/cfenoy
>>>
>>> JobId=20213 Name=sbatch
>>>   UserId=cfenoy(1001) GroupId=users(100)
>>>   Priority=4294901756 Account=(null) QOS=(null) WCKey=*
>>>   JobState=RUNNING Reason=None Dependency=(null)
>>>   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
>>>   RunTime=00:04:39 TimeLimit=UNLIMITED TimeMin=N/A
>>>   SubmitTime=2011-07-01T08:38:33 EligibleTime=2011-07-01T08:38:**33
>>>   StartTime=2011-07-01T08:38:33 EndTime=Unknown
>>>   PreemptTime=NO_VAL SuspendTime=None SecsPreSuspend=0
>>>   Partition=projects AllocNode:Sid=bscop134:13583
>>>   ReqNodeList=(null) ExcNodeList=(null)
>>>   NodeList=bscop134
>>>   BatchHost=bscop134
>>>   NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
>>>   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>>   Features=(null) Gres=gpu:2 Reservation=(null)
>>>   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>>   Command=(null)
>>>   WorkDir=/home/cfenoy
>>>
>>> JobId=20214 Name=sbatch
>>>   UserId=cfenoy(1001) GroupId=users(100)
>>>   Priority=4294901755 Account=(null) QOS=(null) WCKey=*
>>>   JobState=PENDING Reason=Resources Dependency=(null)
>>>   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
>>>   RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
>>>   SubmitTime=2011-07-01T08:38:33 EligibleTime=2011-07-01T08:38:**33
>>>   StartTime=Unknown EndTime=Unknown
>>>   PreemptTime=NO_VAL SuspendTime=None SecsPreSuspend=0
>>>   Partition=projects AllocNode:Sid=bscop134:13583
>>>   ReqNodeList=(null) ExcNodeList=(null)
>>>   NodeList=(null)
>>>   NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
>>>   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>>   Features=(null) Gres=gpu:2 Reservation=(null)
>>>   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>>   Command=(null)
>>>   WorkDir=/home/cfenoy
>>>
>>>
>>> On Thu, Jun 30, 2011 at 7:08 PM, <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>    It looks like there is a configuration problem. You have a gres
>>>    defined in some places as "gpu" and in other places as "gpus",
>>>    which will result in two separate sets of data structures. In
>>>    slurm v2.3 I see that configuration log a bunch of errors.
>>>
>>>
>>>    Quoting Carles Fenoy <[email protected] <mailto:[email protected]>>:
>>>
>>>        Hi all,
>>>
>>>        I've been testing last few days slurm with gres for our future
>>>        nvidia
>>>        machine, and I'm facing some problems with gres overallocating
>>>        resources.
>>>        I've seen the following error every time the controller starts
>>>        a job.
>>>
>>>        [2011-06-28T14:50:55] error: gres/gpu: job 20206 node bscop134
>>>        overallocated
>>>        resources by 2
>>>
>>>        The configuration consists of 1 node with 2 gpus. At the end
>>>        of the email
>>>        you can find the relevant configurations parameters.
>>>
>>>        Is this the expected behavior of the scheduling with gres? Is
>>>        this a bug, or
>>>        there is no way to no over-allocate resources?
>>>
>>>        Best regards,
>>>        Carles Fenoy
>>>
>>>        slurm.conf:
>>>
>>>        SelectType=select/cons_res
>>>
>>>        SelectTypeParameters=CR_CPU
>>>
>>>        SchedulerType=sched/backfill
>>>
>>>        GresTypes=gpu
>>>
>>>        NodeName=DEFAULT RealMemory=12000 Procs=8 TmpDisk=20000
>>>        Gres=gpus:2
>>>
>>>        NodeName=bscop134 NodeAddr=bscop134 Gres=gpus:2
>>>
>>>        PartitionName=projects AllowGroups=ALL Hidden=NO RootOnly=NO
>>>        MaxNodes=UNLIMITED MinNodes=1 MaxTime=UNLIMITED Shared=NO State=UP
>>>        Default=YES Nodes=bscop134
>>>
>>>
>>>        gres.conf:
>>>
>>>        Name=gpu File=/dev/nvidia0 CPUs=0-3
>>>        Name=gpu File=/dev/nvidia1 CPUs=4-7
>>>
>>>
>>>        --
>>>        --
>>>        Carles Fenoy
>>>
>>


-- 
--
Carles Fenoy

Re: [slurm-dev] GRES Overallocating resources

Reply via email to