On Wed, 1 May 2013, Jake Carroll wrote:
Mark,
Thanks for the response. This is opening up a bunch of cool ideas for us.
We're trying to get our heads around how the scaling factor stuff actually
"works" however.
For example, if a host policy says scale factor for mem = 1.0, but we
could perhaps set it to 0.50, what does that actually *mean*? How does it
change the "scale" factor and what impact does it have on the way the
scheduler works to utilise memory on that that node? Trying to get a
better handle on the semantics of this thing.
The scheduler keeps the notion of the "usage" by a particular job. This is
what is injected into the share tree and, presumably, the functional tree.
It is just a number (visible from within qmon). What's important is the
relative usage between your users (for functional) or relative
cumulative usage (for share tree).
Usage is defined as a bunch of weightings multiplied by usage of cpu/mem/io
in the units I mentioned in my last email:
usage = (wcpu * cpu) + (wmem * mem) + (wio * io)
If you have wcpu = 0.5, wmem = 0.5, wio = 0 and you have a job on 4 slots,
lasting for 1 day and consuming 2Gb of RAM per slot, it would generate a
usage of:
usage = (0.5 * 4*1*24*60*60) + (0.5 * 2*4*1*24*60*60) = 129600
What the usage actually "means" in practice depends on the weights you
plug into the usage calculation. The weight numbers can be defined very
precisely and so it can be difficult to decide on _exact_ values to put
in, particularly if you have an inquisitive user or manager looking over
your shoulder asking questions.
Personally, I put together a simple spreadsheet to play around and
investigate the usage generated by different types of job and weights. I
then came up with a simple model which gave a precise answer for the
weights. I don't really care about the number of decimal places in the
answer, but it means I can point to the spreadsheet if I'm challenged :)
It also means anyone who wants it changed has to come up with a better
model first :)
For example, we have "small" node and a "large" node in the same queue,
like so:
...
complex_values virtual_free=92G,h_vmem=92G
...
complex_values virtual_free=373G,h_vmem=373G
...
So - how does the scale factor etc actually impact the schedulers use of
the node?
...
usage_scaling allows you to tell the scheduler that not all slots, or RAM
in the cluster should be considered equal.
For example, if you think that its is affective _occupancy_ of nodes that
is important, you might want to scale the memory usage value before it
feeds into the main usage calculation, so that you generate the same usage
if you fill up a node, no matter how much memory that node has. In the
case above, your second node has 4 times the amount of memory as the
first, so you might want to use a usage_scaling of mem=0.25 for the
second node.
Alternatively, if you have a number of clusters and don't want to bother
mucking around with working out good values for your usage weightings all
the time, you can use usage_scaling to normalise all your node memory
sizes to the same value and then use the same weightings on all clusters.
Or, if you have a mixture of generally available nodes that share tree
calculations should be done for, and nodes dedicated to specific users
that shouldn't (e.g. they bought them), you can just stop the dedicated
nodes from contributing to a particular user's usage via a usage_scaling
of cpu=0.000000,mem=0.000000,io=0.000000.
All the best,
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : [email protected]
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users