I was about to post a similar query. Gaussian 09 job is killed when the memory consumption exceeds half the amount of memory available on a node when --mem-per-cpu is used but the job runs when --mem is used. The relevant lines from slurm.conf is below.
NodeName=node[01-15] RealMemory=48228 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 CPUs=24 State=UNKNOWN TmpDisk=1850000 NodeName=node[16-30] RealMemory=96705 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 CPUs=24 State=UNKNOWN TmpDisk=1850000 Feature=96g Any suggestion is welcome. --Semparithi +++ On 09:41 23 May Paul Edmon wrote: > > I have a user that is running a problem which uses 512 GB of memory. She > request this from SLURM on a node which has this much. However her code > dies: > > slurmd[holy2b09101]: error: Job 6497 exceeded 268435456 KB memory limit, > being killed > slurmd[holy2b09101]: error: Exceeded job memory limit > slurmd[holy2b09101]: error: *** JOB 6497 CANCELLED AT 2013-05-23T00:53:31 *** > > This is half of the 512 GB which was requested. Is there something I am > missing? The nodes in question have: > > NodeName=DEFAULT CPUs=64 RealMemory=529247 Sockets=4 CoresPerSocket=8 > ThreadsPerCore=2 State=UNKNOWN > > These are AMD Abu Dhabi processors with 8 GB per core, so 512 GB total. She > is requesting 8 GB per cpu and is asking for 64 cores. Thoughts? > > -Paul Edmon- -- Semparithi Aravindan
