Mark,
Thanks for the response. This is opening up a bunch of cool ideas for us.
We're trying to get our heads around how the scaling factor stuff actually
"works" however.
For example, if a host policy says scale factor for mem = 1.0, but we
could perhaps set it to 0.50, what does that actually *mean*? How does it
change the "scale" factor and what impact does it have on the way the
scheduler works to utilise memory on that that node? Trying to get a
better handle on the semantics of this thing.
For example, we have "small" node and a "large" node in the same queue,
like so:
root@cluster ~]# qconf -se compute-0-0
hostname compute-0-0.local
load_scaling NONE
complex_values virtual_free=92G,h_vmem=92G
load_values arch=lx26-amd64,num_proc=24,mem_total=96865.863281M,
\
swap_total=0.000000M,virtual_total=96865.863281M, \
load_avg=0.150000,load_short=0.000000, \
load_medium=0.150000,load_long=0.490000, \
mem_free=95567.398438M,swap_free=0.000000M, \
virtual_free=95567.398438M,mem_used=1298.464844M, \
swap_used=0.000000M,virtual_used=1298.464844M, \
cpu=0.000000, \
m_topology=SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT, \
m_topology_inuse=SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT, \
m_socket=2,m_core=12,np_load_avg=0.006250, \
np_load_short=0.000000,np_load_medium=0.006250, \
np_load_long=0.020417
processors 24
user_lists NONE
xuser_lists NONE
projects NONE
xprojects NONE
usage_scaling NONE
report_variables NONE
[root@cluster ~]# qconf -se compute-1-0
hostname compute-1-0.local
load_scaling NONE
complex_values virtual_free=373G,h_vmem=373G
load_values
arch=lx26-amd64,num_proc=80,mem_total=387739.152344M, \
swap_total=0.000000M,virtual_total=387739.152344M, \
load_avg=2.000000,load_short=2.000000, \
load_medium=2.000000,load_long=2.000000, \
mem_free=298652.855469M,swap_free=0.000000M, \
virtual_free=298652.855469M,mem_used=89086.296875M, \
swap_used=0.000000M,virtual_used=89086.296875M, \
cpu=2.500000, \
m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSC
TTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT, \
m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCT
TCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT, \
m_socket=4,m_core=40,np_load_avg=0.025000, \
np_load_short=0.025000,np_load_medium=0.025000, \
np_load_long=0.025000
processors 80
user_lists NONE
xuser_lists NONE
projects NONE
xprojects NONE
usage_scaling NONE
report_variables NONE
So - how does the scale factor etc actually impact the schedulers use of
the node?
Thanks.
--JC
On 30/04/13 9:12 PM, "Mark Dixon" <[email protected]> wrote:
>On Fri, 26 Apr 2013, Jake Carroll wrote:
>...
>> Anyway. What I would really like to know is, if it's possible to weight
>> and "fair-share" based on something other than slots utilisation. Can a
>> user weight on memory utilisation for example? What I'd really like to
>> be able to do is prioritise and weight users down who slam the HPC
>> environment with big high memory jobs, such that they are
>>de-prioritised
>> once their jobs have run, so it gives other users a fair swing at the
>> lovely DIMM modules too.
>...
>
>We use the share tree here, rather than the functional policy, so this
>might not be applicable.
>
>By default, the "usage" of a job is wholly based on slots*seconds. You
>can
>introduce memory (in gigabytes*seconds) by editing the
>"usage_weight_list"
>parameter in "qconf -ssconf". We certainly did :)
>
>See the sched_conf man page for more details.
>
>If you don't have the same amount of RAM everywhere, you might also want
>to play with "usage_scaling" parameters in the execd host definitions.
>
>Good luck :)
>
>Mark
>--
>-----------------------------------------------------------------
>Mark Dixon Email : [email protected]
>HPC/Grid Systems Support Tel (int): 35429
>Information Systems Services Tel (ext): +44(0)113 343 5429
>University of Leeds, LS2 9JT, UK
>-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users