Anyone using non-default settings for usage_weight_list in "qconf -ssconf"?

Our share-tree policy has recently become unfair. This is due to some users submitting many jobs requesting large amounts of memory (e.g. perhaps 4x or more of the memory per slot that we actually have) and are therefore being "undercharged".

I've got to the stage where I think cpu=0.48,mem=0.52,io=0.0 will do what I want and am about to start testing it on a development instance of GE.

However, although I'm happy that the above numbers are reasonable, I'm less happy with how I arrived at them (see below).

Can anyone else who has gone through this comment, please?


My initial constraints:

* I'm not bothering with the "io" portion of the usage calculation, at least currently. I suspect the fact that we have several tiers of storage (fast local disk, shared parallel filesystem, shared home directories) means this isn't much use to me.

* We use h_vmem to control memory usage on hosts. We don't overcommit memory (i.e. a 12Gb host has 12G of h_vmem resource). Default is 1G/slot.

* Our queues have 1 slot per core.

* We treat use of the cluster as a zero-sum game, so use the execd_params setting of "SHARETREE_RESERVED_USAGE=true" to make usage related to what a job has prevented other jobs from using, i.e. mem equals h_vmem and time is in wallclock seconds.

* Playing with a development instance of GE, I see that the memory component of the calculation is measured in Gb * seconds


I calculated my cpu/mem values by:

1) Calculated the average amount of memory per core in the cluster

2) Drawn a stepped graph (memory vs. slots) showing:

  slots = const * roundup(memory / average memory per core)

  (const is originally 1.0)

3) Calculated a line of best fit

4) Tweaked const until the intercept + gradient = 1.0

5) Took the intercept as my "cpu" value, and gradient as my "mem" value.

Clearly, this model is pretty crude. But it seems a pretty crude knob to tweak.

Any comments? Is what I'm doing way too complicated? Or way to simplistic?!

Mark
--
-----------------------------------------------------------------
Mark Dixon                       Email    : [email protected]
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to