In the message dated: Wed, 23 May 2012 09:09:23 BST,
The pithy ruminations from Mark Dixon on
<Re: [gridengine users] cgroups Integration in OGS/GE 2011.11 update 1> were:
=> On Tue, 22 May 2012, Rayson Ho wrote:
=>
=> > For those who missed the Gompute User Group Meeting:
=> >
=> > http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html
=> >
=> > As most of the users are running Linux, it is now time to use a more
=> > modern mechanism for the PDC to track process-job membership. We will
=> > further enhance the cgroups integration beyond the Grid Engine 2011.11
=> > update 1 release - eg. we are planning to support device whitelisting
=> > in a future update.
=> >
=> > Rayson
=> >
=> > P.S. We will create a series of blog postings for the OGS/GE 2011.11
=> > update 1 new features.
=>
=> Hi Rayson,
=>
=> I couldn't agree more, the existing mechanisms are extremely deficient.
=> h_vmem was never a perfect proxy, but in the 64-bit world it's extremely
=> poor.
=>
=> I didn't want to mention it yet, as I'm still knee-deep in qmaster guts,
=> but I'm working on a patchset to make use of the memory cgroup controller.
=> The intention was to start with that only, as it's the most urgent cgroup
=> addition, but in a way that would hopefully allow easy extension to others
=> as appropriate.
=>
=> Intended notable features of the patchset:
=>
=> * Two new resources h_mem and s_mem to limit total memory + swap usage
=> (i.e. not just rss).
Yeah! That would solve a lot of the issues we are facing.
If possible, it would be extremely helpful to express memory limits as a
percentage of available resources, rather than just as a fixed quantity.
Yes, I do the trick of extracting the swap and RAM sizes for each server
and putting those values into a consumable. This is not transparent to
maintain as servers are updated. From and end-user point of view I'd
much rather do somthing like:
-l vmem=60G -l swap=75%
to express that my job may use up to 60GB of total memory (in any
combination of RAM + swap that the OS needs to allocate), but would not
be permitted to use over 75% of swap.
In other words, allow users a lot of latitude in running jobs (even accepting
the slowdown of using swap, if needed), but prevent memory starvation that
causes the OOM-killer to activate.
Thanks,
Mark
=>
=> Mark
=> --
=> -----------------------------------------------------------------
=> Mark Dixon Email : [email protected]
=> HPC/Grid Systems Support Tel (int): 35429
=> Information Systems Services Tel (ext): +44(0)113 343 5429
=> University of Leeds, LS2 9JT, UK
=> -----------------------------------------------------------------
=> _______________________________________________
=> users mailing list
=> [email protected]
=> https://gridengine.org/mailman/listinfo/users
=>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users