Hi,

Am 14.03.2011 um 16:40 schrieb Mark Dixon:

> Anyone using non-default settings for usage_weight_list in "qconf -ssconf"?
> 
> Our share-tree policy has recently become unfair. This is due to some users 
> submitting many jobs requesting large amounts of memory (e.g. perhaps 4x or 
> more of the memory per slot that we actually have) and are therefore being 
> "undercharged".
> 
> I've got to the stage where I think cpu=0.48,mem=0.52,io=0.0 will do what I 
> want and am about to start testing it on a development instance of GE.

this looks like being just the average when you have case a) use all CPUs but 
no memory, and b) use all memory but no CPUs. Both should be equally weighted, 
and you end up with 0.5 as multiplicator for both.


> However, although I'm happy that the above numbers are reasonable, I'm less 
> happy with how I arrived at them (see below).
> 
> Can anyone else who has gone through this comment, please?
> 
> 
> My initial constraints:
> 
> * I'm not bothering with the "io" portion of the usage calculation, at least 
> currently. I suspect the fact that we have several tiers of storage (fast 
> local disk, shared parallel filesystem, shared home directories) means this 
> isn't much use to me.
> 
> * We use h_vmem to control memory usage on hosts. We don't overcommit memory 
> (i.e. a 12Gb host has 12G of h_vmem resource). Default is 1G/slot.
> 
> * Our queues have 1 slot per core.
> 
> * We treat use of the cluster as a zero-sum game, so use the execd_params 
> setting of "SHARETREE_RESERVED_USAGE=true" to make usage related to what a 
> job has prevented other jobs from using, i.e. mem equals h_vmem and time is 
> in wallclock seconds.
> 
> * Playing with a development instance of GE, I see that the memory component 
> of the calculation is measured in  Gb * seconds

All sounds fine. There was an issue that you have to set 
ACCT_RESERVED_USAGE=TRUE in addition, to get it enabled.


> I calculated my cpu/mem values by:
> 
> 1) Calculated the average amount of memory per core in the cluster
> 
> 2) Drawn a stepped graph (memory vs. slots) showing:
> 
>  slots = const * roundup(memory / average memory per core)
> 
>  (const is originally 1.0)
> 
> 3) Calculated a line of best fit
> 
> 4) Tweaked const until the intercept + gradient = 1.0

Which gradient? Steps 1-3 are clear to me, but how do you get another gradient 
there?


> 5) Took the intercept as my "cpu" value, and gradient as my "mem" value.
> 
> Clearly, this model is pretty crude. But it seems a pretty crude knob to 
> tweak.
> 
> Any comments? Is what I'm doing way too complicated? Or way to simplistic?!

I fear, it's not working as intended. The problem is, that it's just adding 2 
integrals (GBs and JOBs) instead of computing a two dimensional integral. 
Having a job using exactly half of the CPU and memory will be charged in the 
same way as you use either resource completely. Despite the fact that another 
job of this kind could run on the node.

It would need some internal adjustment to get it computed in the way you need 
it.

-- Reuti


> Mark
> -- 
> -----------------------------------------------------------------
> Mark Dixon                       Email    : [email protected]
> HPC/Grid Systems Support         Tel (int): 35429
> Information Systems Services     Tel (ext): +44(0)113 343 5429
> University of Leeds, LS2 9JT, UK
> -----------------------------------------------------------------
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to