Historically as a site we haven't liked preemption. Looking over configuration details for preemption, we would presumably set PreemptType=preempt/qos, and then we appear to have a limited choice of PreemptMode of which I think the only acceptable option is REQUEUE, as I don't like the option of cancelling user jobs (even with a 12 hour grace time), and not all our user jobs are able to checkpoint. Not great though as some jobs won't be requeue'able. Is the GraceTime option applied if a job is not requeueable, but chosen to be cancelled?

I think I'm still looking for a mechanism where I can have a set of nodes which only allow jobs which run for less than 12 hours from most users, but allow longer running jobs for users associated with the high-priority QOS.

Cheers,
Steve.

On 08/04/16 13:16, Rémi Palancher wrote:

Le 08/04/2016 04:08, Steven Young a écrit :
[...]
Failing the possibility of these time-floating reservations being able
to "automatically" meet our requirement, does anyone have any other
thoughts about how we might meet our "high priority" requirement with
"guaranteed" start times?

Have you considered using preemption? Check out this link for details:
http://slurm.schedmd.com/preempt.html

This is specifically designed for this use-case.

Best,
Rémi

--
Steven Young, Advanced Research Computing http://www.arc.ox.ac.uk
         University of Oxford IT Services http://www.it.ox.ac.uk

Reply via email to