We currently run a moderately sized (5000+ cores) cluster using SGE. We're looking to move to slurm and have a test setup, but I have some questions about how best to implement/improve on our current setup.

Our setup is a co-op model. We have users who "own shares" of the cluster as well as non-contributing users. We try to guarantee contributing users access to their "share" of the cluster while also maximizing utilization via the following setup:

 o There are 3 queues on each node, and on each node each queue has a
   number of slots equal to the number of real cores on the node (nodes
   with hyperthreading have that feature turned on)

 o Our "lab" queue is for contributing users.  Jobs in this queue run
   un-niced, and each lab has a number of slots in this queue equal to
   their share of the cluster.

 o Our "long" queue is for all users.  Jobs in this queue run "nice -19".

 o We also have a "short" queue for quick jobs.  These jobs run at "nice
   -10" and are limited to 30 minutes.

 o We use np_load_avg on the queues to control oversubscription.  A node
   full of lab queue jobs will not launch jobs in the other queues.
   However, a node full of long queue jobs can still launch lab queue
   jobs, up until both lab and long queues on that node are full.

As a starting point for our new setup, I'm trying to somewhat replicate this. Is gang scheduling what I'm looking for? Do folks have issues with jobs continually being suspended and resumed?

Any pointers or hints would be much appreciated. And feel free to ask for clarification and/or tell me I'm on the completely wrong track. Thanks.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF

Reply via email to