Can someone help assist with this behavior we are seeing?
Slurm - 14.3.8
Linux - SL 6.6
trying to setup preemption via qos
/etc/slurm/slurm.conf -
PreemptMode=REQUEUE
PreemptType=preempt/qos
qos settings are as follows:
Name Priority GraceTime Preempt PreemptMode
Flags UsageThres UsageFactor GrpCPUs GrpCPUMins
GrpCPURunMins GrpJobs GrpMem GrpNodes GrpSubmit GrpWall MaxCPUs
MaxCPUMins MaxNodes MaxWall MaxCPUsPU MaxJobsPU MaxNodesPU MaxSubmitPU
---------- ---------- ---------- ---------- -----------
---------------------------------------- ---------- ----------- --------
----------- ------------- ------- ------- -------- --------- -----------
-------- ----------- -------- ----------- --------- --------- ----------
-----------
normal 0 00:00:00 cluster
1.000000
lr_debug 10000 00:00:00 pr_normal cluster
1.000000
4 00:30:00
lr_normal 1000 00:00:00 pr_normal cluster
1.000000
64 3-00:00:00
c_serial 1000 00:00:00 pr_normal cluster
1.000000
7
1
pr_normal 0 00:00:00 requeue
1.000000
3 3-00:00:00
Jobs are being preempted by lr_normal queued job but instead of being
requeued they are cancelled.
[2015-07-15T14:56:17.888] job_signal 9 of running job 93 successful 0x8008
[2015-07-15T14:56:17.888] preempted job 93 had to be killed
[2015-07-15T14:56:17.940] completing job 93 status 15
How does slurm decide if REQUEUE will cancel or requeue a job and can a
user specify to only do a requeue within sbatch or srun?
Thanks
Jackie