Hiya!
On 18/01/13 19:53, Bjørn-Helge Mevik wrote:
> I don't know if this is the reason in your case, but note that cgroup
> in slurm constrains_resident_ RAM, not_allocated_ ("virtual") RAM.
Hmm, as a sysadmin that doesn't seem very useful, you want it to
constrain how much memory the application can allocate so that it can
learn it has hit a limit when malloc() fails (and hopefully gracefully
report/recover).
We do this in our cpuset based (relatively old) Torque install by making
it set RLIMIT_AS instead of RLIMIT_DATA to enforce memory requests as
current implementations of malloc() in glibc use mmap() rather than
brk() for any non-trivial allocation and mmap() only honours RLIMIT_AS,
not RLIMIT_DATA.
That's not perfect though as the user could launch multiple processes,
each of which can allocate up to RLIMIT_AS. Hence our interest in
cgroups and their ability to set a limit on an entire job.
> Try filling the allocated memory with some values, and you will probably
> see that after filling 4 GiB, the job is killed.
But we don't want the job to be killed, we want it to find out that it's
hit its memory limit. An application should only be able to allocate
the amount of memory the batch job has requested.
cheers!
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: [email protected] Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci