Chris (et al.), This is what we tried but found that it doesn't handle (well) the case where one (or more) of the bf_max_job_user jobs being considered is an array job.
In that case Slurm starts array_tasks until the user hits a QOS enforced resource limit (like GrpCPUs or MaxCPUSPerUser). With array jobs it's still possible (within the resource limitations) to have one or more users consume more than the expected share of resources when considering a 'job' as the unit of measure. -- Trevor > On Oct 6, 2015, at 8:08 PM, Christopher Samuel <[email protected]> wrote: > > > On 06/10/15 00:18, Dr. Markus Stöhr wrote: > >> If such a bunch of jobs has highest priority, nearly all of them might >> start simultanously, not allowing the start of jobs of other users. > > Our solution to this is: > > 1) all jobs go into backfill (defer) > 2) backfill can only start 5 users jobs at a time (bf_max_job_user=5) > 3) go through the whole queue (bf_max_job_start=10000) > 4) continue backfill where you left off (bf_continue) > 5) we limit the number of cores an account can use on a cluster (grpcpus) > > The first 4 are all SchedulerParameters in slurm.conf, the last > is set on accounts via sacctmgr. > > How's that? > > All the best, > Chris > -- > Christopher Samuel Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: [email protected] Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci
