Jake Carroll <[email protected]> writes:

> Hi Grid-Engine gurus of the grid engine list.

I'll answer anyway.

> I have a question about the fair-share policy and the subsequent 
> algorithm/ratios it uses to manage user workload.
>
> Currently, we've set our grid engine up for fair-share in a very traditional 
> way:
>
> enforce_user                 auto
> auto_user_fshare             100
>
> And.
>
> weight_tickets_functional         10000
>
> Now, to the best of my understanding, this is one of the "clean" ways
> to use the fair-share scheduler policy to weight utilisation
> appropriately so that the more a user uses CPU slots, the less run
> time priority they are given compared to users who use very
> little/have not used much for quite a period of time.

The functional policy doesn't consider past usage -- use share tree for
that.

> Anyway. What I would really like to know is, if it's possible to
> weight and "fair-share" based on something other than slots
> utilisation. 

Is there something that suggests usage is calculated in terms of slots
specifically?  See sched_conf(5) for the usage weighting.

> Can a user weight on memory utilisation for example? What I'd really like to 
> be able to do is prioritise and weight users down who slam the HPC 
> environment with big high memory jobs, such that they are de-prioritised once 
> their jobs have run, so it gives other users a fair swing at the lovely DIMM 
> modules too.
>
> I've never seen it done/don't know if it's possible. Just a thought. I
> guess what I'd really like to see is a way to "punish" or "smack"
> users who request huge gobs of RAM continually such that I can then
> deprioritise them so that users who've been nice and play by the rules
> get a fair run, immediately after.

People who disobey rules should lose access.  There's no fundamental
problem with managing requests for memory more than any other resource,
though, and normally no reason to prevent jobs requesting it.  It's not
clear what the real problem is.

> It might be one of those fun/classic NP-hard insolvable problems or
> NP-complete evil things that seem to come up whenever we talk about
> this kind of thing on this list. If so, that's cool/fine. I'll just
> need to buy a pile of more CPU cores and a lot higher density RAM
> modules ;).

Scheduling generally is like that; management is mostly hard.  However,
if a cluster is overburdened and you can simply buy more nodes you're
fortunate to have a special case.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to