I’ve been having the very same problem since I tried to enable Accounting (slurmdbd) - so I have now had to disable accounting.
It would seem therefore that this part of the documentation should be updated: https://slurm.schedmd.com/archive/slurm-15.08-latest/accounting.html "To enable any limit enforcement you must at least have AccountingStorageEnforce=limits in your slurm.conf, otherwise, even if you have limits set, they will not be enforced. " I did not set that option at all in my slurm.conf and yet memory limits started to be enforced - and again I don’t believe the memory estimate was anything like correct. In the new year I may try accounting again but with "MemLimitEnforce=no” set as well :) Merlin -- Merlin Hartley IT Systems Engineer MRC Mitochondrial Biology Unit Cambridge, CB2 0XY United Kingdom > On 15 Dec 2016, at 10:32, Uwe Sauter <uwe.sauter...@gmail.com> wrote: > > > You are correct. Which version do you run? Do you have cgroups enabled? Can > you enable debugging for slurmd on the nodes? The > output should contain what Slurm calculates as maximum memory for a job. > > One other option is do configure MemLimitEnforce=no (which defaults to yes > since 14.11). > > > Am 15.12.2016 um 11:26 schrieb Stefan Doerr: >> But this doesn't answer my question why it reports 10 times as much memory >> usage as it is actually using, no? >> >> On Wed, Dec 14, 2016 at 1:00 PM, Uwe Sauter <uwe.sauter...@gmail.com >> <mailto:uwe.sauter...@gmail.com>> wrote: >> >> >> There are only two memory related options "--mem" and "--mem-per-cpu". >> >> --mem tells slurm the memory requirement of the job (if used with sbatch) >> or the step (if used with srun). But not the requirement >> of each process. >> >> --mem-per-cpu is used in combination with --ntasks and --cpus-per-task. >> If only --mem-per-cpu is used without other options the >> memory requirement is calculated using the configured number of cores >> (NOT the number of cores requested), as far as I can tell. >> >> You might want to play a bit more with the additionall options. >> >> >> >> Am 14.12.2016 um 12:09 schrieb Stefan Doerr: >>> Hi, I'm running a python batch job on SLURM with following options >>> >>> #!/bin/bash >>> # >>> #SBATCH --job-name=metrics >>> #SBATCH --partition=xxx >>> #SBATCH --cpus-per-task=6 >>> #SBATCH --mem=20000 >>> #SBATCH --output=slurm.%N.%j.out >>> #SBATCH --error=slurm.%N.%j.err >>> >>> So as I understand each process will have 20GB of RAM dedicated to it. >>> >>> Running it I get: >>> >>> slurmstepd: Job 72475 exceeded memory limit (39532832 > 20480000), being >>> killed >>> slurmstepd: Exceeded job memory limit >>> >>> This however cannot be true. I've run the same script locally and it uses >>> 1-2GB of RAM. If it was using 40GB I would have >> gone to >>> swap and definitely noticed. >>> >>> So I put some prints in my python code to see how much memory is used and >>> indeed it shows a max usage of 1.7GB and before the >>> error 1.2GB usage. >>> >>> What is happening here? I mean I could increase the mem option but then I >>> will be able to run much fewer jobs on my machines >> which >>> seems really limiting. >> >>