If you notice the slurm.conf setting is 16000M and I ask for 32G.

Get Outlook for iOS<https://aka.ms/o0ukef>




On Tue, Nov 1, 2016 at 6:22 PM -0500, "Carlos Fenoy" 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

You have set a MaxMemPerCPU lower than what you are asking for. Try changing 
that and check if that solves the issue.

Regards,
Carlos

On Tue, Nov 1, 2016 at 10:27 PM, Chad Cropper 
<[email protected]<mailto:[email protected]>> wrote:
SBATCH submissions are not utilizing the –mem-per-cpu option for scheduling 
purposes. Also the AllocMem when running scontrol show nodes, provides the 
DefMemPerCPU * the number of CPUPerTask.

My submit script options:
#!/bin/bash
#SBATCH -M cluster
#SBATCH -A account
#SBATCH --mail-type=end
#SBATCH --mail-user=
#SBATCH -J job1
#SBATCH -e err.log
#SBATCH -o out.log
#SBATCH -p normal
#SBATCH --time=24:00:00
#SBATCH --begin=now
#SBATCH -n 1
#SBATCH -c 4
#SBATCH --mem-per-cpu=36G
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

My slurm.conf file:
[root@titan-master1 ~]# cat /etc/slurm/slurm.conf | grep -v "#"
ClusterName=titan
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/cm/shared/apps/slurm/var/cm/statesave
SlurmdSpoolDir=/cm/local/apps/slurm/var/spool
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/cgroup
CacheGroups=0
ReturnToService=2
TaskPlugin=task/cgroup
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
DefMemPerCPU=1024
MaxMemPerCPU=16000
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
PriorityType=priority/multifactor
PriorityWeightFairshare=0
PriorityWeightAge=1000
PriorityWeightQOS=1000
PriorityMaxAge=3-0
PriorityWeightJobSize=1000
PriorityFavorSmall=NO
PriorityWeightPartition=10000
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurmctld
SlurmdDebug=3
SlurmdLogFile=/var/log/slurmd


JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=task=30,network=30,filesystem=30
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=usfit-hpcc-slurm1.global.internal
AccountingStorageUser=slurm
AccountingStorageLoc=slurm_acct_db
AccountingStoragePass=kol2oja3vCkMAUrB
AccountingStorageEnforce=limits
PrivateData=cloud,nodes,reservations

SchedulerType=sched/backfill
ControlMachine=titan-master1
ControlAddr=titan-master1
NodeName=tnode03  Procs=32 CoresPerSocket=8 RealMemory=515800 Sockets=4 
ThreadsPerCore=1
NodeName=tnode[01,02]  Procs=8 CoresPerSocket=4 RealMemory=96505 Sockets=2 
ThreadsPerCore=1
PartitionName=low Default=YES MinNodes=1 AllowGroups=ALL Priority=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO State=UP 
Nodes=tnode[01-03]
PartitionName=normal Default=NO MinNodes=1 AllowGroups=ALL Priority=5 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO 
Nodes=tnode[01-03]
PartitionName=prd Default=NO MinNodes=1 AllowGroups=ALL Priority=15 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO AllowAccounts=picprd,dairyprd,beefprd,sc AllowQos=ALL LLN=NO 
ExclusiveUser=NO State=UP Nodes=tnode[01-03]
PartitionName=long Default=NO MinNodes=1 AllowGroups=ALL Priority=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO State=UP 
Nodes=tnode[01-03]
PartitionName=short Default=NO MinNodes=1 AllowGroups=ALL Priority=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO State=UP 
Nodes=tnode[01-03]
GresTypes=gpu,mic
PrologSlurmctld=/cm/local/apps/cmd/scripts/prolog-prejob
Prolog=/cm/local/apps/cmd/scripts/prolog
Epilog=/cm/local/apps/cmd/scripts/epilog
FastSchedule=0
SuspendTimeout=30
ResumeTimeout=60
SuspendProgram=/cm/local/apps/cluster-tools/wlm/scripts/slurmpoweroff
ResumeProgram=/cm/local/apps/cluster-tools/wlm/scripts/slurmpoweron



________________________________
*** The information contained in this communication may be confidential, is 
intended only for the use of the recipient(s) named above, and may be legally 
privileged. If the reader of this message is not the intended recipient, you 
are hereby notified that any dissemination, distribution, or copying of this 
communication, or any of its contents, is strictly prohibited. If you have 
received this communication in error, please return it to the sender 
immediately and delete the original message and any copies of it. If you have 
any questions concerning this message, please contact the sender. ***



--
--
Carles Fenoy

________________________________
*** The information contained in this communication may be confidential, is 
intended only for the use of the recipient(s) named above, and may be legally 
privileged. If the reader of this message is not the intended recipient, you 
are hereby notified that any dissemination, distribution, or copying of this 
communication, or any of its contents, is strictly prohibited. If you have 
received this communication in error, please return it to the sender 
immediately and delete the original message and any copies of it. If you have 
any questions concerning this message, please contact the sender. ***

Reply via email to