Mark,

Thanks!    I just upgraded to 8.1.2.      Will these patches work with 8.1.2 or 
were they intended only for 8.1.1?

Joseph

On 09/10/2012 07:45 AM, Mark Dixon wrote:
Hi,

Way back in May I promised this list a simple integration of gridengine with 
the cgroup functionality found in recent Linux distributions. I'm not quite 
sure what happened to all the time between then and now, but I'm fulfilling 
that promise now.

Please find attached a number of patches that add that functionality. They 
happen to be prepared against SoGE 8.1.1, but should be readily portable to 
other gridengine versions e.g. 6.2u5.

Notes:

* This is intentionally a naive, but hopefully extendable, piece of work - 
integrating all of the functionality the cgroup feature has to offer is beyond 
the scope of the problem I was attempting to solve. If you missed it, the Open 
Grid Scheduler people have previously announced that they have written and will 
open source a far more comprehensive implementation: please consider this a 
stop-gap measure until then.

* When enabled in your gridengine configuration, this patchset alters the 
behaviour of h_vmem, h_rss and the accounting value vmem. It also introduces 
two new queue parameters - s_as and h_as.

  * h_vmem and vmem will use the actual RAM+swap usage as reported by the
  cgroup memory controller, instead of simply adding up all the address
  space usage by all processes in the job. This should result in a much
  more accurate measurement of host resource usage (as previously
  discussed on this list).

  * h_vmem will no longer set RLIMIT_AS (i.e. the "virtual memory" line
  in bash's ulimit command) for job processes, as it will now be redundant
  in the majority of cases and is the second source of gridengine's
  over-estimate of memory usage by jobs.

  * If you really do wish to set RLIMIT_AS for a job, then s_as will set
  the soft limit and h_as will set the hard limit.

  * Setting h_rss will limit the maximum amount of RAM a job can use so,
  if the admin wanted to allow swapping, h_rss would limit RAM usage and
  h_vmem would limit RAM+swap usage. The job will only be killed if it
  hits h_vmem, not h_rss.

* Modifications to existing files are licensed under SISSL version 1.2. New 
files are under the LGPL version 3.

* To enable:

  1) Add a "CGROUP_MEMORY=<dir>" parameter to your execd_params
  configuration. "<dir>" should be the path to a directory that exists,
  under where you have the memory cgroup controller mounted e.g. on RHEL6
  with the libcgroup package, /cgroup/memory would work.

  2) If you have queues from an earlier version of gridengine, you will
  need to edit them and set "s_as" and "h_as" to "INFINITY" (instead of
  the default, "NONE").

* Housekeeping: deletion of the cgroup after a job ends can fail due to there 
still being memory in use. Possible examples of this: cached I/O and shared 
libraries that were originally loaded by the job but still in use elsewhere on 
the system.


If you really don't want the new attributes s_as and h_as, don't bother with 
the 2nd and 8th patches; however, this will also re-enable h_vmem's setting of 
RLIMIT_AS and I would strongly recommend you to disable it (search for 
RLIMIT_AS in source/daemons/shepherd/setrlimits.c) as it would defeat the whole 
point of using the memory cgroup (from my personal perspective).

I had intended to try to write a version of it for non-cgroup enabled Linux systems, 
which would make a "best effort" but couldn't strictly enforce the limit. I 
didn't get round to it. Sorry.

While I'm saying sorry, in no particular order, I apologise for: the posting of 
a SoGE patch to the main user list (but I did promise a patch here - and it's 
reasonably generally applicable); the use of MIME (but list servers and mail 
clients often munge patches); skipping the typical one-patch-per-mail 
convention (wanted to minimise email to the uninterested) and anything else 
anyone finds irritating :)

If you find it useful, or would find it useful but don't like the license, 
please let me know.

All the best,

Mark


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to