Hi Bill,

Check out Flags=DenyOnLimit in the sacctmgr man page.

Best,
Lyn


On Tue, Feb 18, 2014 at 4:50 AM, Bill Wichser <[email protected]> wrote:

>
> We have activated a few setting in the database for QOS.  For brevity,
> lets just look at one.
>
> sacctmgr add qos long priority=10 GrpCpus=516 MaxCpusPerUser=258
>
> My expectation was that there would only be 516 cores which could be used
> under this QOS and that users would only be able to submit a largest job
> requiring 258 cores.  (This is an SMP machine with 1500+ cores)
>
> The QOS is assigned in the job_submit.lua script.  But when testing with
> an explicit #SBATCH --qos=long directive, nothing changes.
>
> I submit a job requiring 522 cores, it accepts it and leaves it pending on
> resources:
>
> # scontrol show job 163
> JobId=163 Name=hello.slurm
>    UserId=bill(14119) GroupId=cses(20121)
>    Priority=1680 Account=all QOS=long
>    JobState=PENDING Reason=Resources Dependency=(null)
>    Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
>    RunTime=00:00:00 TimeLimit=1-17:40:00 TimeMin=N/A
>    SubmitTime=2014-02-18T09:33:27 EligibleTime=2014-02-18T09:33:27
>    StartTime=2014-02-19T09:42:19 EndTime=Unknown
>    PreemptTime=None SuspendTime=None SecsPreSuspend=0
>    Partition=normal AllocNode:Sid=hecate:1104586
>    ReqNodeList=(null) ExcNodeList=(null)
>    NodeList=(null)
>    NumNodes=1 NumCPUs=522 CPUs/Task=1 ReqS:C:T=*:*:*
>    MinCPUsNode=1 MinMemoryCPU=5000M MinTmpDiskNode=0
>    Features=(null) Gres=(null) Reservation=(null)
>    Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>    Command=/home/bill/mpi/hello.slurm
>    WorkDir=/home/bill/mpi
>
>
> I would have expected either that the job was rejected or having a Reason
> != Resources.
>
> Also, there are a total amount of cores being used with the qos=long (by
> others) which exceeds this GrpCpus=516 limit.
>
> Obviously I have missed something here.
>
> My goals would be 1) to reject outright jobs exceeding QOS limits of
> MaxCpusPerUser (maybe I also need a MaxCpusPerJob?) and to hold jobs which
> will exceed this GrpCpus limit.
>
> Thanks,
> Bill
>

Reply via email to