On 06/10/15 00:18, Dr. Markus Stöhr wrote:

> If such a bunch of jobs has highest priority, nearly all of them might
> start simultanously, not allowing the start of jobs of other users.

Our solution to this is:

1) all jobs go into backfill (defer)
2) backfill can only start 5 users jobs at a time (bf_max_job_user=5)
3) go through the whole queue (bf_max_job_start=10000)
4) continue backfill where you left off (bf_continue)
5) we limit the number of cores an account can use on a cluster (grpcpus)

The first 4 are all SchedulerParameters in slurm.conf, the last
is set on accounts via sacctmgr.

How's that?

All the best,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to