I have a user that is running a problem which uses 512 GB of memory. She 
request this from SLURM on a node which has this much.  However her code 
dies:

slurmd[holy2b09101]: error: Job 6497 exceeded 268435456 KB memory limit, being 
killed
slurmd[holy2b09101]: error: Exceeded job memory limit
slurmd[holy2b09101]: error: *** JOB 6497 CANCELLED AT 2013-05-23T00:53:31 ***

This is half of the 512 GB which was requested.  Is there something I am 
missing?  The nodes in question have:

NodeName=DEFAULT CPUs=64 RealMemory=529247 Sockets=4 CoresPerSocket=8 
ThreadsPerCore=2 State=UNKNOWN

These are AMD Abu Dhabi processors with 8 GB per core, so 512 GB total.  She is 
requesting 8 GB per cpu and is asking for 64 cores.  Thoughts?

-Paul Edmon-

Reply via email to