Re: [gridengine users] Changing usage_weight_list cpu=1.000000, mem=0.000000, io=0.000000

Mark Dixon Tue, 15 Mar 2011 05:47:49 -0700

On Tue, 15 Mar 2011, Reuti wrote:
...

I've got to the stage where I think cpu=0.48,mem=0.52,io=0.0 will dowhat I want and am about to start testing it on a development instanceof GE.
this looks like being just the average when you have case a) use allCPUs but no memory, and b) use all memory but no CPUs. Both should beequally weighted, and you end up with 0.5 as multiplicator for both.


Thanks for biting Reuti, I really appreciate it :)

No, they're the numbers I got after going through the process outlinedbelow. I'm just trying to get the best fit for a simple model (in step 2).


Am I wildly off here? What do others set usage_weight_list to?

All sounds fine. There was an issue that you have to setACCT_RESERVED_USAGE=TRUE in addition, to get it enabled.


Happily, that doesn't seem to be the case anymore :)

I calculated my cpu/mem values by:

1) Calculated the average amount of memory per core in the cluster

2) Drawn a stepped graph (memory vs. slots) showing:

 slots = const * roundup(memory / average memory per core)

 (const is originally 1.0)

3) Calculated a line of best fit

4) Tweaked const until the intercept + gradient = 1.0

Which gradient? Steps 1-3 are clear to me, but how do you get anothergradient there?

Step 4 is an attempt to introduce the additional constraint that cpu, memand io have to add to 1.0 (io here is always 0.0).

It iteratively alters "const" in step 2 until the intercept and gradientof the line in step 3 add to 1.0. There's a button on my spreadsheet thatdoes the grunt work (William - stop laughing, I'm not proud of it).

I fear, it's not working as intended. The problem is, that it's justadding 2 integrals (GBs and JOBs) instead of computing a two dimensionalintegral. Having a job using exactly half of the CPU and memory will becharged in the same way as you use either resource completely. Despitethe fact that another job of this kind could run on the node.

I don't understand your 3rd sentence. Summarising the config I outlined inmy last message:


  usage (per second) = slots * [ cpu + mem * h_vmem (in G) ]

("cpu" and "mem" being the numbers in usage_weight_list)

e.g. Imagining a modern cluster of 12slot 24G hosts, my method givescpu=0.51, mem=0.49. Usage rates for a selection of jobs:


* 1slot 24G/slot (full node)   =  1*(0.51 + 0.49*24) = 12.27 usage per sec
* 12slot 1G/slot (full node)   = 12*(0.51 + 0.49*1)  = 12.00 usage per sec
* 6slot 2G/slot  (half a node) =  6*(0.51 + 0.49*2)  =  8.94 usage per sec

(1G is our default memory allotment)

I'm not wildly happy that the half node case is being over-charged (goingto have to think about that), but I am happy that the numbers for two fullnodes are roughly the same. Of course you can pick lots of examples wherethe numbers do whatever you want :(

It would need some internal adjustment to get it computed in the way youneed it.

I completely agree that the usage calculation will be wrong in mostsituations, but I'm hoping that it will average-out over the long term.

As far as I can see, cpu=1 and mem=0 is accurate for small h_vmemrequests, and cpu=0 and mem=1 is accurate for large h_vmem requests. Ifelt we needed some sort of quantitative model to think through what theappropriate middle ground is (to start with).

[Hmm, maybe I ought to do this a different way: comparing past jobrequests against the distribution of slots/memory across hosts. No, Ithink this would be thinking about it too much, for such a blunt tool.]


Mark
--
-----------------------------------------------------------------
Mark Dixon                       Email    : [email protected]
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Changing usage_weight_list cpu=1.000000, mem=0.000000, io=0.000000

Reply via email to