Am 15.03.2011 um 13:47 schrieb Mark Dixon:
> On Tue, 15 Mar 2011, Reuti wrote:
> ...
>>> I've got to the stage where I think cpu=0.48,mem=0.52,io=0.0 will do what I
>>> want and am about to start testing it on a development instance of GE.
>>
>> this looks like being just the average when you have case a) use all CPUs
>> but no memory, and b) use all memory but no CPUs. Both should be equally
>> weighted, and you end up with 0.5 as multiplicator for both.
>
> Thanks for biting Reuti, I really appreciate it :)
>
> No, they're the numbers I got after going through the process outlined below.
> I'm just trying to get the best fit for a simple model (in step 2).
>
> Am I wildly off here? What do others set usage_weight_list to?
We use a fair share functional policy only regarding the slots, hence the
default works. Most of the time the slot count is the limiting factor, not the
memory (exceptions apply).
>> All sounds fine. There was an issue that you have to set
>> ACCT_RESERVED_USAGE=TRUE in addition, to get it enabled.
>
> Happily, that doesn't seem to be the case anymore :)
>
>
>>> I calculated my cpu/mem values by:
>>>
>>> 1) Calculated the average amount of memory per core in the cluster
>>>
>>> 2) Drawn a stepped graph (memory vs. slots) showing:
>>>
>>> slots = const * roundup(memory / average memory per core)
>>>
>>> (const is originally 1.0)
>>>
>>> 3) Calculated a line of best fit
>>>
>>> 4) Tweaked const until the intercept + gradient = 1.0
>>
>> Which gradient? Steps 1-3 are clear to me, but how do you get another
>> gradient there?
>
> Step 4 is an attempt to introduce the additional constraint that cpu, mem and
> io have to add to 1.0 (io here is always 0.0).
>
> It iteratively alters "const" in step 2 until the intercept and gradient of
> the line in step 3 add to 1.0. There's a button on my spreadsheet that does
> the grunt work (William - stop laughing, I'm not proud of it).
>
>
>> I fear, it's not working as intended. The problem is, that it's just adding
>> 2 integrals (GBs and JOBs) instead of computing a two dimensional integral.
>> Having a job using exactly half of the CPU and memory will be charged in the
>> same way as you use either resource completely. Despite the fact that
>> another job of this kind could run on the node.
>
> I don't understand your 3rd sentence. Summarising the config I outlined in my
> last message:
I mean exactly what you say in the following for the 3rd case.
> usage (per second) = slots * [ cpu + mem * h_vmem (in G) ]
>
> ("cpu" and "mem" being the numbers in usage_weight_list)
>
> e.g. Imagining a modern cluster of 12slot 24G hosts, my method gives
> cpu=0.51, mem=0.49. Usage rates for a selection of jobs:
>
> * 1slot 24G/slot (full node) = 1*(0.51 + 0.49*24) = 12.27 usage per sec
> * 12slot 1G/slot (full node) = 12*(0.51 + 0.49*1) = 12.00 usage per sec
Whether the user requests 1G, 100M or 2G - the job is blocking a complete node
with 12 slots (why should it be more expensive when you use even more memory
than 1G?) and should be charged in the same way as a single job requesting 24
G. Maybe a reverse approach would do: check what's left for other jobs, this
will reduce the charge from the maximum value: if 23 G by 0 slots is left, you
have to pay the full price, like 0G left by 11 slots.
> * 6slot 2G/slot (half a node) = 6*(0.51 + 0.49*2) = 8.94 usage per sec
> (1G is our default memory allotment)
>
> I'm not wildly happy that the half node case is being over-charged (going to
> have to think about that),
Yep, this was the thing I mentioned with that you can run twice of them.
-- Reuti
> but I am happy that the numbers for two full nodes are roughly the same. Of
> course you can pick lots of examples where the numbers do whatever you want :(
>
>
>> It would need some internal adjustment to get it computed in the way you
>> need it.
>
> I completely agree that the usage calculation will be wrong in most
> situations, but I'm hoping that it will average-out over the long term.
>
> As far as I can see, cpu=1 and mem=0 is accurate for small h_vmem requests,
> and cpu=0 and mem=1 is accurate for large h_vmem requests. I felt we needed
> some sort of quantitative model to think through what the appropriate middle
> ground is (to start with).
>
> [Hmm, maybe I ought to do this a different way: comparing past job requests
> against the distribution of slots/memory across hosts. No, I think this would
> be thinking about it too much, for such a blunt tool.]
> Mark
> --
> -----------------------------------------------------------------
> Mark Dixon Email : [email protected]
> HPC/Grid Systems Support Tel (int): 35429
> Information Systems Services Tel (ext): +44(0)113 343 5429
> University of Leeds, LS2 9JT, UK
> -----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users