On Mon, 21 Jan 2013 17:13:02 -0700, Christopher Samuel <[email protected]> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 22/01/13 01:35, David Bigagli wrote: > > > Perhaps an easy approach is to set RLIMIT_AS in the job itself or > > in its wrapper, then allow the application to handle ENOMEM error. > > This is what we do already in Torque (via a local patch), the only > wrinkle there being that a job script can launch N processes each of > which can allocate up to RLIMIT_AS. > > We were hoping that Slurms cgroups support would permit limiting the > memory allocated by the whole job.
If you want to limit allocations rather than resident memory, you can disable overcommit (i.e. set /proc/sys/vm/overcommit_memory to 2), but afaik in RHEL6 you have to do this for the system as a whole. Disabling overcommit will make calls to malloc() fail immediately if too much memory is already ~committed~ to other processes, so you may end up having a lot less RAM than you expect depending on the behavior of other processes on the system. If your nodes run a single job then this is less of a problem. mark > cheers, > Chris > - -- > Christopher Samuel Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: [email protected] Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlD91LIACgkQO2KABBYQAh+nuwCgk9387NjBHv0sb2PHHYKBP4Sw > XqwAmwXStmfvAyu+XLE258VOre27FK+5 > =6EUk > -----END PGP SIGNATURE----- >
