Committed to master branch.

On 02/05/2015 02:28 AM, Magnus Jonsson wrote:

Hi!

I have attached two patches to the jobacct_gather plugin (common).

The first uses Proportional Set Size (PSS) instead of RSS to determinate
the memory footprint of a job.

More information about PSS can be found here:
http://lwn.net/Articles/230975/

Gather the PSS information is a little bit more complicated (and CPU
intensive) then just the RSS value and might be problem on some
applications.

We have a subset of jobs that loads the dataset in the first process and
then just do a fork() for the number of cores available and do parallel
computation of the data set.

This makes the RSS value go sky high as Slurm calculates the sum of all
RSS values of the processes in the job and Slurm then kills the job :-(


The second patch adds an option not to kill jobs that is over memory
limit. This works well for us that have working cgroups memory limits.

Best regards,
Magnus Jonsson


--

Thanks,
      /David/Bigagli

www.schedmd.com

Reply via email to