[slurm-dev] Re: Slurm, RHEL6, cgroups and not constraining memory

Christopher Samuel Fri, 18 Jan 2013 14:23:05 -0800

Hiya!

On 18/01/13 19:53, Bjørn-Helge Mevik wrote:


> I don't know if this is the reason in your case, but note that cgroup
> in slurm constrains_resident_  RAM, not_allocated_  ("virtual") RAM.

Hmm, as a sysadmin that doesn't seem very useful, you want it to 
constrain how much memory the application can allocate so that it can 
learn it has hit a limit when malloc() fails (and hopefully gracefully 
report/recover).

We do this in our cpuset based (relatively old) Torque install by making 
it set RLIMIT_AS instead of RLIMIT_DATA to enforce memory requests as 
current implementations of malloc() in glibc use mmap() rather than 
brk() for any non-trivial allocation and mmap() only honours RLIMIT_AS, 
not RLIMIT_DATA.

That's not perfect though as the user could launch multiple processes, 
each of which can allocate up to RLIMIT_AS.   Hence our interest in 
cgroups and their ability to set a limit on an entire job.

> Try filling the allocated memory with some values, and you will probably
> see that after filling 4 GiB, the job is killed.

But we don't want the job to be killed, we want it to find out that it's 
hit its memory limit.   An application should only be able to allocate 
the amount of memory the batch job has requested.

cheers!
Chris
-- 
  Christopher Samuel        Senior Systems Administrator
  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.org.au/      http://twitter.com/vlsci

[slurm-dev] Re: Slurm, RHEL6, cgroups and not constraining memory

Reply via email to