On Tue, 15 Mar 2011, Reuti wrote:
...
I've got to the stage where I think cpu=0.48,mem=0.52,io=0.0 will do
what I want and am about to start testing it on a development instance
of GE.
this looks like being just the average when you have case a) use all
CPUs but no memory, and b) use all memory but no CPUs. Both should be
equally weighted, and you end up with 0.5 as multiplicator for both.
Thanks for biting Reuti, I really appreciate it :)
No, they're the numbers I got after going through the process outlined
below. I'm just trying to get the best fit for a simple model (in step 2).
Am I wildly off here? What do others set usage_weight_list to?
All sounds fine. There was an issue that you have to set
ACCT_RESERVED_USAGE=TRUE in addition, to get it enabled.
Happily, that doesn't seem to be the case anymore :)
I calculated my cpu/mem values by:
1) Calculated the average amount of memory per core in the cluster
2) Drawn a stepped graph (memory vs. slots) showing:
slots = const * roundup(memory / average memory per core)
(const is originally 1.0)
3) Calculated a line of best fit
4) Tweaked const until the intercept + gradient = 1.0
Which gradient? Steps 1-3 are clear to me, but how do you get another
gradient there?
Step 4 is an attempt to introduce the additional constraint that cpu, mem
and io have to add to 1.0 (io here is always 0.0).
It iteratively alters "const" in step 2 until the intercept and gradient
of the line in step 3 add to 1.0. There's a button on my spreadsheet that
does the grunt work (William - stop laughing, I'm not proud of it).
I fear, it's not working as intended. The problem is, that it's just
adding 2 integrals (GBs and JOBs) instead of computing a two dimensional
integral. Having a job using exactly half of the CPU and memory will be
charged in the same way as you use either resource completely. Despite
the fact that another job of this kind could run on the node.
I don't understand your 3rd sentence. Summarising the config I outlined in
my last message:
usage (per second) = slots * [ cpu + mem * h_vmem (in G) ]
("cpu" and "mem" being the numbers in usage_weight_list)
e.g. Imagining a modern cluster of 12slot 24G hosts, my method gives
cpu=0.51, mem=0.49. Usage rates for a selection of jobs:
* 1slot 24G/slot (full node) = 1*(0.51 + 0.49*24) = 12.27 usage per sec
* 12slot 1G/slot (full node) = 12*(0.51 + 0.49*1) = 12.00 usage per sec
* 6slot 2G/slot (half a node) = 6*(0.51 + 0.49*2) = 8.94 usage per sec
(1G is our default memory allotment)
I'm not wildly happy that the half node case is being over-charged (going
to have to think about that), but I am happy that the numbers for two full
nodes are roughly the same. Of course you can pick lots of examples where
the numbers do whatever you want :(
It would need some internal adjustment to get it computed in the way you
need it.
I completely agree that the usage calculation will be wrong in most
situations, but I'm hoping that it will average-out over the long term.
As far as I can see, cpu=1 and mem=0 is accurate for small h_vmem
requests, and cpu=0 and mem=1 is accurate for large h_vmem requests. I
felt we needed some sort of quantitative model to think through what the
appropriate middle ground is (to start with).
[Hmm, maybe I ought to do this a different way: comparing past job
requests against the distribution of slots/memory across hosts. No, I
think this would be thinking about it too much, for such a blunt tool.]
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : [email protected]
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users