[slurm-dev] Re: SLURM reports much higher memory usage than really used

Stefan Doerr Thu, 15 Dec 2016 05:48:32 -0800

$ sinfo --version
slurm 15.08.11

$ sacct --format="CPUTime,MaxRSS" -j 72491
   CPUTime     MaxRSS
---------- ----------
  00:27:06
  00:27:06  37316236K



I will have to ask the sysadms about cgroups since I'm just a user here.

On Thu, Dec 15, 2016 at 12:05 PM, Merlin Hartley <
merlin-sl...@mrc-mbu.cam.ac.uk> wrote:

> I’ve been having the very same problem since I tried to enable Accounting
> (slurmdbd) - so I have now had to disable accounting.
>
> It would seem therefore that this part of the documentation should be
> updated:
> https://slurm.schedmd.com/archive/slurm-15.08-latest/accounting.html
> "To enable any limit enforcement you must at least have
> *AccountingStorageEnforce=limits* in your slurm.conf, otherwise, even if
> you have limits set, they will not be enforced. "
>
> I did not set that option at all in my slurm.conf and yet memory limits
> started to be enforced - and again I don’t believe the memory estimate was
> anything like correct.
>
> In the new year I may try accounting again but with "MemLimitEnforce=no”
> set as well :)
>
>
> Merlin
>
>
> --
> Merlin Hartley
> IT Systems Engineer
> MRC Mitochondrial Biology Unit
> Cambridge, CB2 0XY
> United Kingdom
>
> On 15 Dec 2016, at 10:32, Uwe Sauter <uwe.sauter...@gmail.com> wrote:
>
>
> You are correct. Which version do you run? Do you have cgroups enabled?
> Can you enable debugging for slurmd on the nodes? The
> output should contain what Slurm calculates as maximum memory for a job.
>
> One other option is do configure MemLimitEnforce=no (which defaults to yes
> since 14.11).
>
>
> Am 15.12.2016 um 11:26 schrieb Stefan Doerr:
>
> But this doesn't answer my question why it reports 10 times as much memory
> usage as it is actually using, no?
>
> On Wed, Dec 14, 2016 at 1:00 PM, Uwe Sauter <uwe.sauter...@gmail.com <
> mailto:uwe.sauter...@gmail.com <uwe.sauter...@gmail.com>>> wrote:
>
>
>    There are only two memory related options "--mem" and "--mem-per-cpu".
>
>    --mem tells slurm the memory requirement of the job (if used with
> sbatch) or the step (if used with srun). But not the requirement
>    of each process.
>
>    --mem-per-cpu is used in combination with --ntasks and --cpus-per-task.
> If only --mem-per-cpu is used without other options the
>    memory requirement is calculated using the configured number of cores
> (NOT the number of cores requested), as far as I can tell.
>
>    You might want to play a bit more with the additionall options.
>
>
>
>    Am 14.12.2016 um 12:09 schrieb Stefan Doerr:
>
> Hi, I'm running a python batch job on SLURM with following options
>
> #!/bin/bash
> #
> #SBATCH --job-name=metrics
> #SBATCH --partition=xxx
> #SBATCH --cpus-per-task=6
> #SBATCH --mem=20000
> #SBATCH --output=slurm.%N.%j.out
> #SBATCH --error=slurm.%N.%j.err
>
> So as I understand each process will have 20GB of RAM dedicated to it.
>
> Running it I get:
>
> slurmstepd: Job 72475 exceeded memory limit (39532832 > 20480000), being
> killed
> slurmstepd: Exceeded job memory limit
>
> This however cannot be true. I've run the same script locally and it uses
> 1-2GB of RAM. If it was using 40GB I would have
>
>    gone to
>
> swap and definitely noticed.
>
> So I put some prints in my python code to see how much memory is used and
> indeed it shows a max usage of 1.7GB and before the
> error 1.2GB usage.
>
> What is happening here? I mean I could increase the mem option but then I
> will be able to run much fewer jobs on my machines
>
>    which
>
> seems really limiting.
>
>
>
>
>

[slurm-dev] Re: SLURM reports much higher memory usage than really used

Reply via email to