I decided to test it locally. So I ran the exact batch script that I run on
SLURM on my machine and monitored max memory usage with time -v

The first prints that you see is what python is reporting as RSS and
Virtual memory being used currently. As you can see it maxes out at 1.7GB
rss which agrees perfectly with the Maximum resident set size (kbytes):
1742836 that time -v reports.

The only thing I can think of that could confuse SLURM would be that my
calculations are done using joblib.parallel in python. This forks the
process to parallelize my calculations
https://pythonhosted.org/joblib/parallel.html#using-the-threading-backend
As you can read here:
"By default Parallel uses the Python multiprocessing module to fork
separate Python worker processes to execute tasks concurrently on separate
CPUs. This is a reasonable default for generic Python programs but it
induces some overhead as the input and output data need to be serialized in
a queue for communication with the worker processes."

If I check on "top" indeed it shows all processes using the same amount of
memory. Hence if I spawn 10 processes and you sum usages it would look like
10x the memory usage.


[a@xxx yyy]$ /usr/bin/time -v bash run.sh
...
128.8192 1033.396224
128.262144 1032.871936
...
1776.664576 2648.588288
472.502272 1368.522752
...
142.925824 1037.533184
135.794688 1037.254656
Command being timed: "bash run.sh"
User time (seconds): 3501.55
System time (seconds): 263.83
Percent of CPU this job got: 1421%
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:24.88
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1742836
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 4
Minor (reclaiming a frame) page faults: 83921772
Voluntary context switches: 111865
Involuntary context switches: 98409
Swaps: 0
File system inputs: 904096
File system outputs: 29224
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

On Thu, Dec 15, 2016 at 2:50 PM, Stefan Doerr <stefdo...@gmail.com> wrote:

> $ sinfo --version
> slurm 15.08.11
>
> $ sacct --format="CPUTime,MaxRSS" -j 72491
>    CPUTime     MaxRSS
> ---------- ----------
>   00:27:06
>   00:27:06  37316236K
>
>
> I will have to ask the sysadms about cgroups since I'm just a user here.
>
> On Thu, Dec 15, 2016 at 12:05 PM, Merlin Hartley <
> merlin-sl...@mrc-mbu.cam.ac.uk> wrote:
>
>> I’ve been having the very same problem since I tried to enable Accounting
>> (slurmdbd) - so I have now had to disable accounting.
>>
>> It would seem therefore that this part of the documentation should be
>> updated:
>> https://slurm.schedmd.com/archive/slurm-15.08-latest/accounting.html
>> "To enable any limit enforcement you must at least have
>> *AccountingStorageEnforce=limits* in your slurm.conf, otherwise, even if
>> you have limits set, they will not be enforced. "
>>
>> I did not set that option at all in my slurm.conf and yet memory limits
>> started to be enforced - and again I don’t believe the memory estimate was
>> anything like correct.
>>
>> In the new year I may try accounting again but with "MemLimitEnforce=no”
>> set as well :)
>>
>>
>> Merlin
>>
>>
>> --
>> Merlin Hartley
>> IT Systems Engineer
>> MRC Mitochondrial Biology Unit
>> Cambridge, CB2 0XY
>> United Kingdom
>>
>> On 15 Dec 2016, at 10:32, Uwe Sauter <uwe.sauter...@gmail.com> wrote:
>>
>>
>> You are correct. Which version do you run? Do you have cgroups enabled?
>> Can you enable debugging for slurmd on the nodes? The
>> output should contain what Slurm calculates as maximum memory for a job.
>>
>> One other option is do configure MemLimitEnforce=no (which defaults to
>> yes since 14.11).
>>
>>
>> Am 15.12.2016 um 11:26 schrieb Stefan Doerr:
>>
>> But this doesn't answer my question why it reports 10 times as much
>> memory usage as it is actually using, no?
>>
>> On Wed, Dec 14, 2016 at 1:00 PM, Uwe Sauter <uwe.sauter...@gmail.com <
>> mailto:uwe.sauter...@gmail.com <uwe.sauter...@gmail.com>>> wrote:
>>
>>
>>    There are only two memory related options "--mem" and "--mem-per-cpu".
>>
>>    --mem tells slurm the memory requirement of the job (if used with
>> sbatch) or the step (if used with srun). But not the requirement
>>    of each process.
>>
>>    --mem-per-cpu is used in combination with --ntasks and
>> --cpus-per-task. If only --mem-per-cpu is used without other options the
>>    memory requirement is calculated using the configured number of cores
>> (NOT the number of cores requested), as far as I can tell.
>>
>>    You might want to play a bit more with the additionall options.
>>
>>
>>
>>    Am 14.12.2016 um 12:09 schrieb Stefan Doerr:
>>
>> Hi, I'm running a python batch job on SLURM with following options
>>
>> #!/bin/bash
>> #
>> #SBATCH --job-name=metrics
>> #SBATCH --partition=xxx
>> #SBATCH --cpus-per-task=6
>> #SBATCH --mem=20000
>> #SBATCH --output=slurm.%N.%j.out
>> #SBATCH --error=slurm.%N.%j.err
>>
>> So as I understand each process will have 20GB of RAM dedicated to it.
>>
>> Running it I get:
>>
>> slurmstepd: Job 72475 exceeded memory limit (39532832 > 20480000), being
>> killed
>> slurmstepd: Exceeded job memory limit
>>
>> This however cannot be true. I've run the same script locally and it uses
>> 1-2GB of RAM. If it was using 40GB I would have
>>
>>    gone to
>>
>> swap and definitely noticed.
>>
>> So I put some prints in my python code to see how much memory is used and
>> indeed it shows a max usage of 1.7GB and before the
>> error 1.2GB usage.
>>
>> What is happening here? I mean I could increase the mem option but then I
>> will be able to run much fewer jobs on my machines
>>
>>    which
>>
>> seems really limiting.
>>
>>
>>
>>
>>
>

Reply via email to