On Tue, May 29, 2012 at 03:50:00AM -0400, Jake Carroll wrote:
Hi all.
Haven't posted much here yet, as I'm new to the list/learning.
Welcome.
Irrespective, I have a question of the community.
One of the things I've always liked the idea of is to be able to dynamically allocate
"slots" or a number of jobs that can run for a single user based upon the
global load of the cluster/nodes. Such that, it gives an overall very efficient
utilisation of a cluster.
An example might be a cluster sitting idle. Let's say that it's got 1024 "slots" doing
nothing and a user jumps on. Some convention wisdom/queue semantics we use is a parameter that says
the user cannot "take" more than 400 or 500 slots a a time.
Reasonable.
That's great, but it leaves the rest of the cluster sitting cold.
Yes...
What I'd really like to be able to do is dynamically load balance load such that if the
cluster is idle, a user can take up a great significant portion of it. If a cluster is
fairly heavily subscribed, a user gets less of a slice as they log in, but a
"fairshare" policy means they get the next-best swing at it if another user
takes up a great deal of slots for a great time period.
Check.
I am aware that schedulers such as PBS Pro have some form of
technique/complexes for addressing things like this.
Not familiar with them, but the idea of balancing like this is (I think)
well known.
I appreciate it's fairly abstract, but just wondering if SGE/OGE have any
similar semantics?
You actually have two problems here, not one: how to allow a user to
use "all/most of the cluster when idle," and "how to be fair when the
cluster is not idle." They are related, but somewhat distinct.
SGE has at least three features options that may help, either in whole or
in part. The first two are purely scheduler based, and the third invoves
"resource quotas."
SGE has two distinct scheduleing options, known as "functional shares"
and the "share tree." Both methods rely on tickets or shares. If you
have 9 tickets, and I have 3 tickets, SGE will try to balance things
such that you get 75% of the resources, and I get 25%.
The main difference (among others) is that the share tree tracks
historical usage, whereas functional shares do not. Functional shares
only look at current usage. Given the 3/9 functioanl ticket allocation
I mentioned above, I could use 100% of a cluster when it is idle, but as
soon as you start to use it, SGE will preferentially give resources to
you to balance things based on the number of tickets we each have.
Share trees work similarly, except that underusage is compensated for a
period of time to help those users get their proportional share of
allocated resources. Same cluster, same ticket allocation, but you have
been using the cluster at close to 100% for a solid week, and I haven't
used it at all. When I submit jobs, SGE will overallocate resources to
me, beyond 25%, to try and achive a 25/75% balance over some set period
of time (even if the short term is unbalanced).
Pictures help:
http://www.gridengine.info/2005/09/30/pretty-pictures-explain-functional-vs-sharetree-scheduling/
The third trick SGE has are something called "resource quotas". These
permit you to set policies like "user A can use only 50 slots", or
"project X--regardless of which user--can use only 100 slots". It's
quite powerful.
Where it may be useful in your situation is to have an external program
that watches the overall load on the cluster, and dynamically sets
resource quotas based on some sort of formula. For example, you may
want to keep 10% of the 1,024 node cluster free, so you set a resource
quota for any single user at 920 slots. 30 minutes later, when there is
heavy usage, the script automatically adjusts that down to, say, 400
slots for a single user.
One final, and important, note: once a job starts running, SGE doesn't
(usually) do anything else to it, and the job is left to finish on its
own. There are subordinate queues, which pause jobs (via SIGSTOP), and
SGE does support checkpointing-assuming the program itself implements it
(SGE cannot magically make your programs checkpoint).
Hope this helps.
--
Jesse Becker
NHGRI Linux support (Digicon Contractor)
Specialization is for insects. -- R.A.Heinlein
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users