Hiya! On 18/01/13 19:53, Bjørn-Helge Mevik wrote:
> I don't know if this is the reason in your case, but note that cgroup > in slurm constrains_resident_ RAM, not_allocated_ ("virtual") RAM. Hmm, as a sysadmin that doesn't seem very useful, you want it to constrain how much memory the application can allocate so that it can learn it has hit a limit when malloc() fails (and hopefully gracefully report/recover). We do this in our cpuset based (relatively old) Torque install by making it set RLIMIT_AS instead of RLIMIT_DATA to enforce memory requests as current implementations of malloc() in glibc use mmap() rather than brk() for any non-trivial allocation and mmap() only honours RLIMIT_AS, not RLIMIT_DATA. That's not perfect though as the user could launch multiple processes, each of which can allocate up to RLIMIT_AS. Hence our interest in cgroups and their ability to set a limit on an entire job. > Try filling the allocated memory with some values, and you will probably > see that after filling 4 GiB, the job is killed. But we don't want the job to be killed, we want it to find out that it's hit its memory limit. An application should only be able to allocate the amount of memory the batch job has requested. cheers! Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci