On Thu, 31 May 2012, Rayson Ho wrote:
...
Correct me if I am wrong, I thought your main concern was related to
setting address space limit with memsw. The biggest difference is that
with RLIMIT, malloc would return NULL vs in the memsw case the job
would be killed by the kernel OOM killer, and that's not the behavior
you want.
My underlying concern is that sometimes it is appropriate to set an
address space limit and sometimes it isn't, for the reasons we both put
forward previously in this thread. Users should therefore have some
control over it.
I hope we agree on this much?
So what I've asked Ron to do is to check with the kernel guys and see
if they can provide us the semantics of setrlimit() with
memory.memsw.limit_in_bytes, or add a new limit in the memory cgroup
controller.
Ah-ha, that's the missing piece that's been confusing me!
The options here seem to be:
1) Ask the kernel people nicely to give us a per-cgroup address space
limit. I don't think they will see much point of this.
2) As well as using setrlimit, enforce a per-cgroup address space limit by
the PDC periodically polling just the processes in that cgroup. Does
s_rss, s_stack, etc. do anything in gridengine these days - do you already
have a such a poll loop to deliver that functionality?
3) Bring the definitions of h_vmem / s_vmem into line with the likes of
h_stack, h_rss, etc. - interpret them in terms of setrlimit only and make
no attempt to enforce per-job limits.
Even if successful, I agree that (1) sounds like a major headache. (2)
gives the greatest backwards compatibility. If you don't already have a
poll loop and want to avoid putting one in, (3) should be sufficient to
avoid loss of functionality.
(May be I should have clarified the above point in my previous email -
but I was really busy these days, working on the GE2011.11u1 release,
handling outside of the mailing list user support, and talking to
hardware vendors, etc...)
Thanks for continuing this conversation, I appreciate (and apologise for)
the time you're putting into it. I've obviously not done a very good job
at being clear and concise.
All the best,
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : [email protected]
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users