Hi , have an issue with the resource allocation.
In the environment have partition like below: PartitionName=small_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE State=UP Shared=YES Priority=8000 PartitionName=large_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE State=UP Shared=YES Priority=100 Also the node allocated with less cpu and lot of cpu resources available NodeName=Node17 Arch=x86_64 CoresPerSocket=18 CPUAlloc=4 CPUErr=0 CPUTot=36 CPULoad=4.09 AvailableFeatures=K2200 ActiveFeatures=K2200 Gres=gpu:2 NodeAddr=Node1717 NodeHostName=Node17 Version=17.11 OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct 31 12:25:04 UTC 2018 (3090901) RealMemory=1 AllocMem=0 FreeMem=225552 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=small_jobs,large_jobs BootTime=2020-03-21T18:56:48 SlurmdStartTime=2020-03-31T09:07:03 CfgTRES=cpu=36,mem=1M,billing=36 AllocTRES=cpu=4 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s there is no other job in small_jobs partition but several jobs are in pending in the large_jobs and the resources are available but jobs are not going through. one of the job pening output is: scontrol show job 1250258 JobId=1250258 JobName=import_workflow UserId=m209767(100468) GroupId=oled(4289) MCS_label=N/A Priority=363157 Nice=0 Account=oledgrp QOS=normal JobState=PENDING Reason=Priority Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2020-03-28T22:00:13 EligibleTime=2020-03-28T22:00:13 StartTime=2070-03-19T11:59:09 EndTime=Unknown Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-03-31T12:58:48 Partition=large_jobs AllocNode:Sid=deda1x1466:62260 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=1,node=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 Gres=(null) Reservation=(null) OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) this is my slurm.conf file for scheduling. SchedulerType=sched/builtin #SchedulerParameters=enable_user_top SelectType=select/cons_res #SelectTypeParameters=CR_Core_Memory SelectTypeParameters=CR_Core Any idea why the job is not going for execution if cpu cores are avaiable. Also would like to know if any jobs are running on a particular node and if i restart the Slurmd service then in what scenario my job will get killed. Generally it should not kill the job. Regards Navin.