Hi, we were running 14.03.03 and updated to 14.03.06 yesterday, and since then 
I've been seeing bizarre figures for NumCPUs in submitted jobs.
For example, I submit a simple job as follows:

> sbatch -n 30 slurmtest.script
JobId=431695 Name=slurmtest.script

> scontrol show job 431695
JobId=431695 Name=slurmtest.script
   UserId=kevin(7260) GroupId=glue-staff(8675)
   Priority=40133 Nice=0 Account=bubba QOS=wide-short
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
   RunTime=00:00:10 TimeLimit=00:30:00 TimeMin=N/A
   SubmitTime=2014-07-23T08:34:48 EligibleTime=2014-07-23T08:34:48
   StartTime=2014-07-23T08:34:49 EndTime=2014-07-23T09:04:49
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=standard AllocNode:Sid=deepthought2:21227
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=compute-b25-[24,37]
   BatchHost=compute-b25-24
   NumNodes=2 NumCPUs=280 CPUs/Task=1 ReqB:S:C:T=0:0:*:*   <---- NOTICE NumCPUs 
here
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=0
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=0 Contiguous=0 Licenses=(null) Network=(null)
   Command=/export/home/dt2-admin/kevin/slurmtest.script
   WorkDir=/home/dt2-admin/kevin
   StdErr=/home/dt2-admin/kevin/slurm-431695.out
   StdIn=/dev/null
   StdOut=/home/dt2-admin/kevin/slurm-431695.out

This cluster is made up of nodes of 20 cores each, so I'd expect NumCPUs to be 
40 since the job is exclusive.
Here's the node records for the nodes that were assigned:

> scontrol show node "compute-b25-[24,37]"
NodeName=compute-b25-24 Arch=x86_64 CoresPerSocket=10
   CPUAlloc=20 CPUErr=0 CPUTot=20 CPULoad=2.97 Features=(null)
   Gres=(null)
   NodeAddr=compute-b25-24 NodeHostName=compute-b25-24 Version=14.03
   OS=Linux RealMemory=128000 AllocMem=0 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=750000 Weight=1
   BootTime=2014-07-01T09:03:34 SlurmdStartTime=2014-07-23T08:24:44
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


NodeName=compute-b25-37 Arch=x86_64 CoresPerSocket=10
   CPUAlloc=20 CPUErr=0 CPUTot=20 CPULoad=0.01 Features=(null)
   Gres=(null)
   NodeAddr=compute-b25-37 NodeHostName=compute-b25-37 Version=14.03
   OS=Linux RealMemory=128000 AllocMem=0 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=750000 Weight=1
   BootTime=2014-07-01T09:03:34 SlurmdStartTime=2014-07-23T08:24:47
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


I'm seeing this behavior on two different clusters, both of which were updated 
to 14.03.06.  Was something changed recently that could explain this?

Thanks,
Kevin

Reply via email to