I use 2.5.6. -Paul Edmon-
On 05/23/2013 01:58 PM, S. Aravindan wrote: > I guess so. The slurm version I use is 2.5.4. I have attached my > slurm.conf with this mail. > > --Semparithi > > +++ On 10:34 23 May Paul Edmon wrote: >> Hmm, maybe its the ThreadsPerCore? Perhaps its thinks there are half as >> many core as there really are due to the ThreadsPerCore. Thus if you do >> the --mem-per-cpu it will only give you half, as it only counts cores >> not threads*cores? >> >> -Paul Edmon- >> >> On 05/23/2013 01:31 PM, S. Aravindan wrote: >>> I was about to post a similar query. Gaussian 09 job is killed when the >>> memory consumption exceeds half the amount of memory available on a node >>> when --mem-per-cpu is used but the job runs when --mem is used. The >>> relevant lines from slurm.conf is below. >>> >>> NodeName=node[01-15] RealMemory=48228 Sockets=2 CoresPerSocket=6 >>> ThreadsPerCore=2 CPUs=24 State=UNKNOWN TmpDisk=1850000 >>> NodeName=node[16-30] RealMemory=96705 Sockets=2 CoresPerSocket=6 >>> ThreadsPerCore=2 CPUs=24 State=UNKNOWN TmpDisk=1850000 Feature=96g >>> >>> Any suggestion is welcome. >>> >>> --Semparithi >>> >>> >>> +++ On 09:41 23 May Paul Edmon wrote: >>>> I have a user that is running a problem which uses 512 GB of memory. She >>>> request this from SLURM on a node which has this much. However her code >>>> dies: >>>> >>>> slurmd[holy2b09101]: error: Job 6497 exceeded 268435456 KB memory limit, >>>> being killed >>>> slurmd[holy2b09101]: error: Exceeded job memory limit >>>> slurmd[holy2b09101]: error: *** JOB 6497 CANCELLED AT 2013-05-23T00:53:31 >>>> *** >>>> >>>> This is half of the 512 GB which was requested. Is there something I am >>>> missing? The nodes in question have: >>>> >>>> NodeName=DEFAULT CPUs=64 RealMemory=529247 Sockets=4 CoresPerSocket=8 >>>> ThreadsPerCore=2 State=UNKNOWN >>>> >>>> These are AMD Abu Dhabi processors with 8 GB per core, so 512 GB total. >>>> She is requesting 8 GB per cpu and is asking for 64 cores. Thoughts? >>>> >>>> -Paul Edmon- >>> -- Semparithi Aravindan
