Hey Jackie, only batch jobs can be requeued. Otherwise they get
canceled. If you have level "debug" debugging on you would get a
message like "Job-requeue can only be done for batch jobs" for non-batch
jobs right before the "had to be killed" message.
Are you seeing this for batch jobs as well? If so make sure your
slurm.conf doesn't have JobRequeue=0 or you will have to have each
sbatch have the --requeue option to allow requeueing.
Let me know if that helps or not
On 07/15/15 15:32, Jacqueline Scoggins wrote:
requeue with preemption not working
Can someone help assist with this behavior we are seeing?
Slurm - 14.3.8
Linux - SL 6.6
trying to setup preemption via qos
/etc/slurm/slurm.conf -
PreemptMode=REQUEUE
PreemptType=preempt/qos
qos settings are as follows:
Name Priority GraceTime Preempt PreemptMode Flags UsageThres
UsageFactor GrpCPUs GrpCPUMins GrpCPURunMins GrpJobs GrpMem
GrpNodes GrpSubmit GrpWall MaxCPUs MaxCPUMins MaxNodes MaxWall
MaxCPUsPU MaxJobsPU MaxNodesPU MaxSubmitPU
---------- ---------- ---------- ---------- -----------
---------------------------------------- ---------- -----------
-------- ----------- ------------- ------- ------- -------- ---------
----------- -------- ----------- -------- ----------- ---------
--------- ---------- -----------
normal 0 00:00:00 cluster
1.000000
lr_debug 10000 00:00:00 pr_normal cluster
1.000000 4 00:30:00
lr_normal 1000 00:00:00 pr_normal cluster
1.000000 64 3-00:00:00
c_serial 1000 00:00:00 pr_normal cluster
1.000000 7 1
pr_normal 0 00:00:00 requeue
1.000000 3 3-00:00:00
Jobs are being preempted by lr_normal queued job but instead of being
requeued they are cancelled.
[2015-07-15T14:56:17.888] job_signal 9 of running job 93 successful 0x8008
[2015-07-15T14:56:17.888] preempted job 93 had to be killed
[2015-07-15T14:56:17.940] completing job 93 status 15
How does slurm decide if REQUEUE will cancel or requeue a job and can
a user specify to only do a requeue within sbatch or srun?
Thanks
Jackie