Historically as a site we haven't liked preemption. Looking over
configuration details for preemption, we would presumably set
PreemptType=preempt/qos, and then we appear to have a limited choice of
PreemptMode of which I think the only acceptable option is REQUEUE, as I
don't like the option of cancelling user jobs (even with a 12 hour grace
time), and not all our user jobs are able to checkpoint. Not great
though as some jobs won't be requeue'able. Is the GraceTime option
applied if a job is not requeueable, but chosen to be cancelled?
I think I'm still looking for a mechanism where I can have a set of
nodes which only allow jobs which run for less than 12 hours from most
users, but allow longer running jobs for users associated with the
high-priority QOS.
Cheers,
Steve.
On 08/04/16 13:16, Rémi Palancher wrote:
Le 08/04/2016 04:08, Steven Young a écrit :
[...]
Failing the possibility of these time-floating reservations being able
to "automatically" meet our requirement, does anyone have any other
thoughts about how we might meet our "high priority" requirement with
"guaranteed" start times?
Have you considered using preemption? Check out this link for details:
http://slurm.schedmd.com/preempt.html
This is specifically designed for this use-case.
Best,
Rémi
--
Steven Young, Advanced Research Computing http://www.arc.ox.ac.uk
University of Oxford IT Services http://www.it.ox.ac.uk