On 06/10/15 00:18, Dr. Markus Stöhr wrote: > If such a bunch of jobs has highest priority, nearly all of them might > start simultanously, not allowing the start of jobs of other users.
Our solution to this is: 1) all jobs go into backfill (defer) 2) backfill can only start 5 users jobs at a time (bf_max_job_user=5) 3) go through the whole queue (bf_max_job_start=10000) 4) continue backfill where you left off (bf_continue) 5) we limit the number of cores an account can use on a cluster (grpcpus) The first 4 are all SchedulerParameters in slurm.conf, the last is set on accounts via sacctmgr. How's that? All the best, Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: [email protected] Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
