$ sinfo --version slurm 15.08.11 $ sacct --format="CPUTime,MaxRSS" -j 72491 CPUTime MaxRSS ---------- ---------- 00:27:06 00:27:06 37316236K
I will have to ask the sysadms about cgroups since I'm just a user here. On Thu, Dec 15, 2016 at 12:05 PM, Merlin Hartley < merlin-sl...@mrc-mbu.cam.ac.uk> wrote: > I’ve been having the very same problem since I tried to enable Accounting > (slurmdbd) - so I have now had to disable accounting. > > It would seem therefore that this part of the documentation should be > updated: > https://slurm.schedmd.com/archive/slurm-15.08-latest/accounting.html > "To enable any limit enforcement you must at least have > *AccountingStorageEnforce=limits* in your slurm.conf, otherwise, even if > you have limits set, they will not be enforced. " > > I did not set that option at all in my slurm.conf and yet memory limits > started to be enforced - and again I don’t believe the memory estimate was > anything like correct. > > In the new year I may try accounting again but with "MemLimitEnforce=no” > set as well :) > > > Merlin > > > -- > Merlin Hartley > IT Systems Engineer > MRC Mitochondrial Biology Unit > Cambridge, CB2 0XY > United Kingdom > > On 15 Dec 2016, at 10:32, Uwe Sauter <uwe.sauter...@gmail.com> wrote: > > > You are correct. Which version do you run? Do you have cgroups enabled? > Can you enable debugging for slurmd on the nodes? The > output should contain what Slurm calculates as maximum memory for a job. > > One other option is do configure MemLimitEnforce=no (which defaults to yes > since 14.11). > > > Am 15.12.2016 um 11:26 schrieb Stefan Doerr: > > But this doesn't answer my question why it reports 10 times as much memory > usage as it is actually using, no? > > On Wed, Dec 14, 2016 at 1:00 PM, Uwe Sauter <uwe.sauter...@gmail.com < > mailto:uwe.sauter...@gmail.com <uwe.sauter...@gmail.com>>> wrote: > > > There are only two memory related options "--mem" and "--mem-per-cpu". > > --mem tells slurm the memory requirement of the job (if used with > sbatch) or the step (if used with srun). But not the requirement > of each process. > > --mem-per-cpu is used in combination with --ntasks and --cpus-per-task. > If only --mem-per-cpu is used without other options the > memory requirement is calculated using the configured number of cores > (NOT the number of cores requested), as far as I can tell. > > You might want to play a bit more with the additionall options. > > > > Am 14.12.2016 um 12:09 schrieb Stefan Doerr: > > Hi, I'm running a python batch job on SLURM with following options > > #!/bin/bash > # > #SBATCH --job-name=metrics > #SBATCH --partition=xxx > #SBATCH --cpus-per-task=6 > #SBATCH --mem=20000 > #SBATCH --output=slurm.%N.%j.out > #SBATCH --error=slurm.%N.%j.err > > So as I understand each process will have 20GB of RAM dedicated to it. > > Running it I get: > > slurmstepd: Job 72475 exceeded memory limit (39532832 > 20480000), being > killed > slurmstepd: Exceeded job memory limit > > This however cannot be true. I've run the same script locally and it uses > 1-2GB of RAM. If it was using 40GB I would have > > gone to > > swap and definitely noticed. > > So I put some prints in my python code to see how much memory is used and > indeed it shows a max usage of 1.7GB and before the > error 1.2GB usage. > > What is happening here? I mean I could increase the mem option but then I > will be able to run much fewer jobs on my machines > > which > > seems really limiting. > > > > >