Dear Chris,

we will keep this in mind. Currently, we have backfilling enabled, but without any parameters. We have users, that are specifying correct time limits, but at least as many do not specify any time limit and use the maximum time of the QOS they are running in. This might be a problem, when all jobs are handled via backfilling. We also want to be sure that those users with high priorities get their jobs startet as soon as possible. This is more a psychological problem, as we have quite a number of users from different institutions/universities who should all be treated equally.

br
Markus


On 10/07/2015 05:07 AM, Christopher Samuel wrote:

On 06/10/15 00:18, Dr. Markus Stöhr wrote:

If such a bunch of jobs has highest priority, nearly all of them might
start simultanously, not allowing the start of jobs of other users.

Our solution to this is:

1) all jobs go into backfill (defer)
2) backfill can only start 5 users jobs at a time (bf_max_job_user=5)
3) go through the whole queue (bf_max_job_start=10000)
4) continue backfill where you left off (bf_continue)
5) we limit the number of cores an account can use on a cluster (grpcpus)

The first 4 are all SchedulerParameters in slurm.conf, the last
is set on accounts via sacctmgr.

How's that?

All the best,
Chris



--
=====================================================
Dr. Markus Stöhr
Zentraler Informatikdienst BOKU Wien / TU Wien
Wiedner Hauptstraße 8-10
1040 Wien

Tel. +43-1-58801-420754
Fax  +43-1-58801-9420754

Email: [email protected]
=====================================================

Reply via email to