[gridengine users] Changing usage_weight_list cpu=1.000000, mem=0.000000, io=0.000000

Mark Dixon Mon, 14 Mar 2011 08:41:01 -0700

Anyone using non-default settings for usage_weight_list in "qconf-ssconf"?

Our share-tree policy has recently become unfair. This is due to someusers submitting many jobs requesting large amounts of memory (e.g.perhaps 4x or more of the memory per slot that we actually have) and aretherefore being "undercharged".

I've got to the stage where I think cpu=0.48,mem=0.52,io=0.0 will do whatI want and am about to start testing it on a development instance of GE.

However, although I'm happy that the above numbers are reasonable, I'mless happy with how I arrived at them (see below).


Can anyone else who has gone through this comment, please?


My initial constraints:

* I'm not bothering with the "io" portion of the usage calculation, atleast currently. I suspect the fact that we have several tiers of storage(fast local disk, shared parallel filesystem, shared home directories)means this isn't much use to me.

* We use h_vmem to control memory usage on hosts. We don't overcommitmemory (i.e. a 12Gb host has 12G of h_vmem resource). Default is 1G/slot.


* Our queues have 1 slot per core.

* We treat use of the cluster as a zero-sum game, so use the execd_paramssetting of "SHARETREE_RESERVED_USAGE=true" to make usage related to what ajob has prevented other jobs from using, i.e. mem equals h_vmem and timeis in wallclock seconds.

* Playing with a development instance of GE, I see that the memorycomponent of the calculation is measured in Gb * seconds



I calculated my cpu/mem values by:

1) Calculated the average amount of memory per core in the cluster

2) Drawn a stepped graph (memory vs. slots) showing:

  slots = const * roundup(memory / average memory per core)

  (const is originally 1.0)

3) Calculated a line of best fit

4) Tweaked const until the intercept + gradient = 1.0

5) Took the intercept as my "cpu" value, and gradient as my "mem" value.

Clearly, this model is pretty crude. But it seems a pretty crude knob totweak.

Any comments? Is what I'm doing way too complicated? Or way tosimplistic?!


Mark
--
-----------------------------------------------------------------
Mark Dixon                       Email    : [email protected]
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] Changing usage_weight_list cpu=1.000000, mem=0.000000, io=0.000000

Reply via email to