Dear Chris,
we will keep this in mind. Currently, we have backfilling enabled, but
without any parameters. We have users, that are specifying correct time
limits, but at least as many do not specify any time limit and use the
maximum time of the QOS they are running in. This might be a problem,
when all jobs are handled via backfilling. We also want to be sure that
those users with high priorities get their jobs startet as soon as
possible. This is more a psychological problem, as we have quite a
number of users from different institutions/universities who should all
be treated equally.
br
Markus
On 10/07/2015 05:07 AM, Christopher Samuel wrote:
On 06/10/15 00:18, Dr. Markus Stöhr wrote:
If such a bunch of jobs has highest priority, nearly all of them might
start simultanously, not allowing the start of jobs of other users.
Our solution to this is:
1) all jobs go into backfill (defer)
2) backfill can only start 5 users jobs at a time (bf_max_job_user=5)
3) go through the whole queue (bf_max_job_start=10000)
4) continue backfill where you left off (bf_continue)
5) we limit the number of cores an account can use on a cluster (grpcpus)
The first 4 are all SchedulerParameters in slurm.conf, the last
is set on accounts via sacctmgr.
How's that?
All the best,
Chris
--
=====================================================
Dr. Markus Stöhr
Zentraler Informatikdienst BOKU Wien / TU Wien
Wiedner Hauptstraße 8-10
1040 Wien
Tel. +43-1-58801-420754
Fax +43-1-58801-9420754
Email: [email protected]
=====================================================