[slurm-dev] Re: Memory Issues

Paul Edmon Thu, 23 May 2013 10:34:17 -0700

Hmm, maybe its the ThreadsPerCore?  Perhaps its thinks there are half as 
many core as there really are due to the ThreadsPerCore. Thus if you do 
the --mem-per-cpu it will only give you half, as it only counts cores 
not threads*cores?


-Paul Edmon-

On 05/23/2013 01:31 PM, S. Aravindan wrote:
> I was about to post a similar query. Gaussian 09 job is killed when the
> memory consumption exceeds half the amount of memory available on a node
> when --mem-per-cpu is used but the job runs when --mem is used.  The
> relevant lines from slurm.conf is below.
>
> NodeName=node[01-15] RealMemory=48228 Sockets=2 CoresPerSocket=6 
> ThreadsPerCore=2 CPUs=24 State=UNKNOWN TmpDisk=1850000
> NodeName=node[16-30] RealMemory=96705 Sockets=2 CoresPerSocket=6 
> ThreadsPerCore=2 CPUs=24 State=UNKNOWN TmpDisk=1850000 Feature=96g
>
> Any suggestion is welcome.
>
> --Semparithi
>
>
> +++ On 09:41 23 May Paul Edmon wrote:
>> I have a user that is running a problem which uses 512 GB of memory. She
>> request this from SLURM on a node which has this much.  However her code
>> dies:
>>
>> slurmd[holy2b09101]: error: Job 6497 exceeded 268435456 KB memory limit, 
>> being killed
>> slurmd[holy2b09101]: error: Exceeded job memory limit
>> slurmd[holy2b09101]: error: *** JOB 6497 CANCELLED AT 2013-05-23T00:53:31 ***
>>
>> This is half of the 512 GB which was requested.  Is there something I am 
>> missing?  The nodes in question have:
>>
>> NodeName=DEFAULT CPUs=64 RealMemory=529247 Sockets=4 CoresPerSocket=8 
>> ThreadsPerCore=2 State=UNKNOWN
>>
>> These are AMD Abu Dhabi processors with 8 GB per core, so 512 GB total.  She 
>> is requesting 8 GB per cpu and is asking for 64 cores.  Thoughts?
>>
>> -Paul Edmon-
> -- Semparithi Aravindan

[slurm-dev] Re: Memory Issues

Reply via email to