Hi folks
I've also recently encountered a similar problem, but in my case, I'm
wondering if this is normal or not.
We have a user submitting MPI jobs with
#SBATCH --nodes=8
#SBATCH --exclusive
#SBATCH --mem-per-cpu=131072
and mpiexec called with 128 tasks requested, not using srun.
which land on nodes with 250000MB real memory and 16 cores, e.g., with
sinfo -o %N,%c,%m,%Z
NODELIST,CPUS,MEMORY,THREADS
barcoo004,16,250000,1
# scontrol show job 1678855
JobId=1678855 Name=ls_job
..
JobState=RUNNING Reason=None Dependency=(null)
Requeue=0 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=1-21:53:28 TimeLimit=2-12:00:00 TimeMin=N/A
..
NodeList=barcoo[004,051,062,066-070]
BatchHost=barcoo004
NumNodes=8 NumCPUs=8 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryCPU=128G MinTmpDiskNode=0
It was our guess that with mem-per-cpu=131072MB, only one task can fit each
node, so Slurm accounting only includes one CPU per node
# sacct -j 1678855 -o jobid,nnodes,ntasks,alloccpus
JobID NNodes NTasks AllocCPUS
------------ -------- -------- ----------
1678855 8 8
1678855.0 7 7 7
However, all 16 CPUs in the node is allocated (because of --exclusive):
# scontrol show node barcoo004
NodeName=barcoo004 Arch=x86_64 CoresPerSocket=8
CPUAlloc=16 CPUErr=0 CPUTot=16 CPULoad=16.00 Features=(null)
Gres=(null)
NodeAddr=barcoo004 NodeHostName=barcoo004
OS=Linux RealMemory=250000 AllocMem=131072 Sockets=2 Boards=1
State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=2
BootTime=2014-02-25T23:28:52 SlurmdStartTime=2014-05-13T09:45:33
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Note CPUAlloc=16 but AllocMem=131072 -- the latter implying only one CPU is
allocated. And when we look inside, there are 16 MPI tasks running
(engaging all 16 cores).
Is the disparity because mpiexec is called instead of srun, due to
performance degradation with the latter on 2.6.5?
Regards
Jeff Tan
High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences, Melbourne, Australia