Hi, we were running 14.03.03 and updated to 14.03.06 yesterday, and since then I've been seeing bizarre figures for NumCPUs in submitted jobs. For example, I submit a simple job as follows:
> sbatch -n 30 slurmtest.script JobId=431695 Name=slurmtest.script > scontrol show job 431695 JobId=431695 Name=slurmtest.script UserId=kevin(7260) GroupId=glue-staff(8675) Priority=40133 Nice=0 Account=bubba QOS=wide-short JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0 RunTime=00:00:10 TimeLimit=00:30:00 TimeMin=N/A SubmitTime=2014-07-23T08:34:48 EligibleTime=2014-07-23T08:34:48 StartTime=2014-07-23T08:34:49 EndTime=2014-07-23T09:04:49 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=standard AllocNode:Sid=deepthought2:21227 ReqNodeList=(null) ExcNodeList=(null) NodeList=compute-b25-[24,37] BatchHost=compute-b25-24 NumNodes=2 NumCPUs=280 CPUs/Task=1 ReqB:S:C:T=0:0:*:* <---- NOTICE NumCPUs here Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=0 MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=0 Contiguous=0 Licenses=(null) Network=(null) Command=/export/home/dt2-admin/kevin/slurmtest.script WorkDir=/home/dt2-admin/kevin StdErr=/home/dt2-admin/kevin/slurm-431695.out StdIn=/dev/null StdOut=/home/dt2-admin/kevin/slurm-431695.out This cluster is made up of nodes of 20 cores each, so I'd expect NumCPUs to be 40 since the job is exclusive. Here's the node records for the nodes that were assigned: > scontrol show node "compute-b25-[24,37]" NodeName=compute-b25-24 Arch=x86_64 CoresPerSocket=10 CPUAlloc=20 CPUErr=0 CPUTot=20 CPULoad=2.97 Features=(null) Gres=(null) NodeAddr=compute-b25-24 NodeHostName=compute-b25-24 Version=14.03 OS=Linux RealMemory=128000 AllocMem=0 Sockets=2 Boards=1 State=ALLOCATED ThreadsPerCore=1 TmpDisk=750000 Weight=1 BootTime=2014-07-01T09:03:34 SlurmdStartTime=2014-07-23T08:24:44 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s NodeName=compute-b25-37 Arch=x86_64 CoresPerSocket=10 CPUAlloc=20 CPUErr=0 CPUTot=20 CPULoad=0.01 Features=(null) Gres=(null) NodeAddr=compute-b25-37 NodeHostName=compute-b25-37 Version=14.03 OS=Linux RealMemory=128000 AllocMem=0 Sockets=2 Boards=1 State=ALLOCATED ThreadsPerCore=1 TmpDisk=750000 Weight=1 BootTime=2014-07-01T09:03:34 SlurmdStartTime=2014-07-23T08:24:47 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s I'm seeing this behavior on two different clusters, both of which were updated to 14.03.06. Was something changed recently that could explain this? Thanks, Kevin
