I’ve been having the very same problem since I tried to enable Accounting 
(slurmdbd) - so I have now had to disable accounting.

It would seem therefore that this part of the documentation should be updated:
https://slurm.schedmd.com/archive/slurm-15.08-latest/accounting.html
"To enable any limit enforcement you must at least have 
AccountingStorageEnforce=limits in your slurm.conf, otherwise, even if you have 
limits set, they will not be enforced. "

I did not set that option at all in my slurm.conf and yet memory limits started 
to be enforced - and again I don’t believe the memory estimate was anything 
like correct.

In the new year I may try accounting again but with "MemLimitEnforce=no” set as 
well :)


Merlin


--
Merlin Hartley
IT Systems Engineer
MRC Mitochondrial Biology Unit
Cambridge, CB2 0XY
United Kingdom

> On 15 Dec 2016, at 10:32, Uwe Sauter <uwe.sauter...@gmail.com> wrote:
> 
> 
> You are correct. Which version do you run? Do you have cgroups enabled? Can 
> you enable debugging for slurmd on the nodes? The
> output should contain what Slurm calculates as maximum memory for a job.
> 
> One other option is do configure MemLimitEnforce=no (which defaults to yes 
> since 14.11).
> 
> 
> Am 15.12.2016 um 11:26 schrieb Stefan Doerr:
>> But this doesn't answer my question why it reports 10 times as much memory 
>> usage as it is actually using, no?
>> 
>> On Wed, Dec 14, 2016 at 1:00 PM, Uwe Sauter <uwe.sauter...@gmail.com 
>> <mailto:uwe.sauter...@gmail.com>> wrote:
>> 
>> 
>>    There are only two memory related options "--mem" and "--mem-per-cpu".
>> 
>>    --mem tells slurm the memory requirement of the job (if used with sbatch) 
>> or the step (if used with srun). But not the requirement
>>    of each process.
>> 
>>    --mem-per-cpu is used in combination with --ntasks and --cpus-per-task. 
>> If only --mem-per-cpu is used without other options the
>>    memory requirement is calculated using the configured number of cores 
>> (NOT the number of cores requested), as far as I can tell.
>> 
>>    You might want to play a bit more with the additionall options.
>> 
>> 
>> 
>>    Am 14.12.2016 um 12:09 schrieb Stefan Doerr:
>>> Hi, I'm running a python batch job on SLURM with following options
>>> 
>>> #!/bin/bash
>>> #
>>> #SBATCH --job-name=metrics
>>> #SBATCH --partition=xxx
>>> #SBATCH --cpus-per-task=6
>>> #SBATCH --mem=20000
>>> #SBATCH --output=slurm.%N.%j.out
>>> #SBATCH --error=slurm.%N.%j.err
>>> 
>>> So as I understand each process will have 20GB of RAM dedicated to it.
>>> 
>>> Running it I get:
>>> 
>>> slurmstepd: Job 72475 exceeded memory limit (39532832 > 20480000), being 
>>> killed
>>> slurmstepd: Exceeded job memory limit
>>> 
>>> This however cannot be true. I've run the same script locally and it uses 
>>> 1-2GB of RAM. If it was using 40GB I would have
>>    gone to
>>> swap and definitely noticed.
>>> 
>>> So I put some prints in my python code to see how much memory is used and 
>>> indeed it shows a max usage of 1.7GB and before the
>>> error 1.2GB usage.
>>> 
>>> What is happening here? I mean I could increase the mem option but then I 
>>> will be able to run much fewer jobs on my machines
>>    which
>>> seems really limiting.
>> 
>> 

Reply via email to