On Wed, 23 May 2012, Rayson Ho wrote:
On Wed, May 23, 2012 at 4:09 AM, Mark Dixon <[email protected]> wrote:
Intended notable features of the patchset:
* Two new resources h_mem and s_mem to limit total memory + swap usage (i.e.
not just rss).
In my implementation, I did not add any new queue resource limits.
If you are adding new resource limit, you will need to change the
spooling structure a bit. I am just mapping existing queue limits to
the cgroup limits.
Yup, I've updated the spooling structure. I hit a speed bump yesterday,
until I realised that the spooling code was in a dlopen'd library and I
was using the old version instead of the new. Sorted now :)
How are you mapping existing queue limits to cgroup limits?
memory.limit_in_bytes fits nicely onto h_rss (thanks for the suggestion
William), but the crucially important memory.memsw.limit_in_bytes
(rss+swap) doesn't seem to have an existing concept. Unless you're
hijacking h_vmem?
...
Not sure how you are implementing it... In our implementation we
either use cgroups to tag processes or we switch back to the old
additional GID PDC completely. Since every single release of Grid
Engine distributed by us (Open Grid Scheduler) is fully compatible
with SGE 6.2 u5, if one upgrades from SGE 6.2u5 to Grid Engine 2011.11
update 1, the execd & shepherd pair would still behave like SGE 6.2
u5. (The way to enable cgroups PDC is by a config switch.)
...
I'm keeping it compatible too, but by extending the existing PDC. A
PDC-enforced h_mem would be useful for non-cgroup compatible systems, I
also figured that extending would avoid a major rewrite to reproduce a lot
of existing behaviour.
(I've added a CGROUPS_MEMORY flag to execd_params, to let the admin enable
the functionality by telling it where the memory cgroup controller is
mounted.)
Of course I could look at the PDC and figure it's simpler to have a
completely separate code path, but that's a decision to make later this
week (I've been in it before - was hanging onto after some of the other
stuff before diving in that again! ;)
How far along with your solution are you? Am I just duplicating work
someone else has already done?
We are less than 1 month away from shipping, and we are already
running the cgroups code in our test cluster.
Also, the qstat, qmod, and cat output capture was from a real session
(see:
http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html
). It was for the Gompute User Group Meeting presentation that took
place earlier this month (May 8th and 9th).
...
When/were you thinking of open sourcing it?
I've got a new cluster coming online, so now is my main chance of changing
user behaviour before the next machine that comes along. Sadly, I suspect
that the ScalableLogic code will be too late for it.
Is there some way we can collaborate on this one?
All the best,
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : [email protected]
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users