Hello,

I am relatively new to Slurm.

I have a 7-nodes, 36 core/cpu running cluster, with all nodes in mixed mode,
one partition, plenty of ram, default ram allocation is ram/nbcpu.

It is generally running well, we are still experimenting.

We just added a new node, and an annoying problem just rised. When
submitting a small non exclusive job requesting 4 cpu, the job is set as
PENDING, with reason RESOURCES.

I would like to find a way to get all the details about the reason (lack of
cpu, ram size, is there something exclusive on some other jobs, other
reason?) because all the nodes are not full (free cpu, free ram although it
is not easy to know, non exclusive mode).

Thank you for any tips


example:

   JobId=173744 JobName=
   UserId= GroupId=
   Priority=2250 Nice=0 Account=laas QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=12:00:00 TimeMin=N/A
   SubmitTime=2017-06-21T11:06:18 EligibleTime=2017-06-21T11:06:18
   StartTime=2017-06-22T11:27:46 EndTime=Unknown
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=all AllocNode:Sid=
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=5 CPUs/Task=5 ReqB:S:C:T=0:0:*:*
   TRES=cpu=5,mem=17500,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=5 MinMemoryCPU=3500M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=
   WorkDir=
   StdErr=
   StdIn=
   StdOut=
   Power= SICP=0

Reply via email to