On Thu, 24 May 2012, Rayson Ho wrote:
...
How are you mapping existing queue limits to cgroup limits?
memory.limit_in_bytes fits nicely onto h_rss (thanks for the suggestion
William), but the crucially important memory.memsw.limit_in_bytes (rss+swap)
doesn't seem to have an existing concept. Unless you're hijacking h_vmem?

It's not really hijacking h_vmem - in the end, your memsw limit is the
virtual size limit of the process/job.

Hi Rayson,

My concern about this mainly centres around the fact that "virtual memory size" already has a very specific meaning. It does not mean RAM usage + swap and so isn't the same as memsw.

This does impact on things:

1) It's not a "drop in" replacement. If upgrading gridengine on an existing system, activating your cgroup code will cause an immediate change in behaviour of jobs, without the user altering their submission flags. People don't tend to like that sort of thing.

2) It's removing functionality. The old behaviour allows you to ensure that a job will fail if it mallocs something that's too big. Either it runs or it doesn't - and provides a decent error code you can handle rather than just die. That could be important to some people. In contrast, the new behaviour will permit the malloc, but then stop at some less predictable point in the future when you use more than the permitted amount of the memory.

3) We've all got lots of users already using the old code. No matter what we say to them, when presented with a new system most will ignore any documentation written by admins and just copy their old job scripts across and keep using them. Since the memory usage as measured by the cgroup PDC is likely to be much lower (but more accurate) than that by the traditional PDC, we've not forced users to read the documentation and reassess their memory needs (e.g. a JSV rejecting the use of h_vmem unless you've also have a "-l yes_I_really_mean_h_vmem"). So we don't get an immediate big improvement in throughput.

Personally, my concern is mainly centered around (3).

In my view, using a new set of attributes (I don't care what they're called), rather than overloading old ones, avoids all of these issues.

However, I freely admit that it makes the decision about what to do about the accounting file somewhat less obvious.


...
Open Grid Scheduler is "commercial open source", so when we ship GE
2011.11 update 1, you will get the source. We are only selling
*optional* support, we don't sell our code under a commercial license,
see:

http://www.scalablelogic.com/scalable-grid-engine-support

That is fantastic news (sorry, I keep losing track of people's commercial models): my sincere thanks for this.


...
Is there some way we can collaborate on this one?

There are 2 issues that we need to solve first:

1) Copyright assignment - like any other open source projects, we do
need to own the rights or else it is not safe for ISVs to use our
code. So far, the external contributions are smaller and quite
straightforward (in terms of the code change - the debugging behind
that is often times much more complicated - eg. Brooks Davis'
BeyondTrust AD fix in shepherd)... For larger contributions, we need
to audit the code.

Let me start another thread to follow up with this specific topic.

Sure :)


2) As we are shipping in less than 1 month, what do you plan to
change?? We are only bug fixing the cgroups integration code now, and
we plan to add enhancements only in later update releases.
...

You clearly have a more complete and advanced implementation than what I've done and were intending to do. You therefore have priority and I presumably have little to offer you (believe it or not, this is great news for me!)

At this stage about this specific feature, I'm hoping there can be a discussion about how it can be best presented to the end user (once I was sure I was capable of doing a cgroup feature, my next port of call was going to be this list to start that conversation).

To my mind, this in particular means the attribute names and the accounting file. I hope that it could be open to consensus - after all, we all have a stake in this.

All the best,

Mark

PS Forgot to say in my previous emails (where are my manners?) - congratulations to all on the imminent release and the development work that has gone into it, and thanks again: it's very pleasing to see this sort of thing done under an open source model.

--
-----------------------------------------------------------------
Mark Dixon                       Email    : [email protected]
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to