On Mon, 21 Jan 2013 17:13:02 -0700, Christopher Samuel <[email protected]> 
wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 22/01/13 01:35, David Bigagli wrote:
> 
> > Perhaps an easy approach is to set RLIMIT_AS in the job itself or
> > in its wrapper, then allow the application to handle ENOMEM error.
> 
> This is what we do already in Torque (via a local patch), the only
> wrinkle there being that a job script can launch N processes each of
> which can allocate up to RLIMIT_AS.
> 
> We were hoping that Slurms cgroups support would permit limiting the
> memory allocated by the whole job.

If you want to limit allocations rather than resident memory, you
can disable overcommit (i.e. set /proc/sys/vm/overcommit_memory to 2),
but afaik in RHEL6 you have to do this for the system as a
whole. Disabling overcommit will make calls to malloc() fail immediately
if too much memory is already ~committed~ to other processes, so you
may end up having a lot less RAM than you expect depending on the
behavior of other processes on the system. If your nodes run a single
job then this is less of a problem.

mark

 
> cheers,
> Chris
> - -- 
>  Christopher Samuel        Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: [email protected] Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/      http://twitter.com/vlsci
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iEYEARECAAYFAlD91LIACgkQO2KABBYQAh+nuwCgk9387NjBHv0sb2PHHYKBP4Sw
> XqwAmwXStmfvAyu+XLE258VOre27FK+5
> =6EUk
> -----END PGP SIGNATURE-----
> 

Reply via email to