Hello,
I am relatively new to Slurm. I have a 7-nodes, 36 core/cpu running cluster, with all nodes in mixed mode, one partition, plenty of ram, default ram allocation is ram/nbcpu. It is generally running well, we are still experimenting. We just added a new node, and an annoying problem just rised. When submitting a small non exclusive job requesting 4 cpu, the job is set as PENDING, with reason RESOURCES. I would like to find a way to get all the details about the reason (lack of cpu, ram size, is there something exclusive on some other jobs, other reason?) because all the nodes are not full (free cpu, free ram although it is not easy to know, non exclusive mode). Thank you for any tips example: JobId=173744 JobName= UserId= GroupId= Priority=2250 Nice=0 Account=laas QOS=normal JobState=PENDING Reason=Resources Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 DerivedExitCode=0:0 RunTime=00:00:00 TimeLimit=12:00:00 TimeMin=N/A SubmitTime=2017-06-21T11:06:18 EligibleTime=2017-06-21T11:06:18 StartTime=2017-06-22T11:27:46 EndTime=Unknown PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=all AllocNode:Sid= ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) NumNodes=1 NumCPUs=5 CPUs/Task=5 ReqB:S:C:T=0:0:*:* TRES=cpu=5,mem=17500,node=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=5 MinMemoryCPU=3500M MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command= WorkDir= StdErr= StdIn= StdOut= Power= SICP=0
