Hey Jackie, only batch jobs can be requeued. Otherwise they get canceled. If you have level "debug" debugging on you would get a message like "Job-requeue can only be done for batch jobs" for non-batch jobs right before the "had to be killed" message.

Are you seeing this for batch jobs as well? If so make sure your slurm.conf doesn't have JobRequeue=0 or you will have to have each sbatch have the --requeue option to allow requeueing.

Let me know if that helps or not

On 07/15/15 15:32, Jacqueline Scoggins wrote:
requeue with preemption not working
Can someone help assist with this behavior we are seeing?

Slurm - 14.3.8
Linux - SL 6.6

trying to setup preemption via qos
/etc/slurm/slurm.conf -
   PreemptMode=REQUEUE
   PreemptType=preempt/qos


qos settings are as follows:

Name Priority GraceTime Preempt PreemptMode Flags UsageThres UsageFactor GrpCPUs GrpCPUMins GrpCPURunMins GrpJobs GrpMem GrpNodes GrpSubmit GrpWall MaxCPUs MaxCPUMins MaxNodes MaxWall MaxCPUsPU MaxJobsPU MaxNodesPU MaxSubmitPU

---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- -------- ----------- ------------- ------- ------- -------- --------- ----------- -------- ----------- -------- ----------- --------- --------- ---------- -----------

normal 0 00:00:00 cluster 1.000000

lr_debug 10000 00:00:00 pr_normal cluster 1.000000 4 00:30:00

lr_normal 1000 00:00:00 pr_normal cluster 1.000000 64 3-00:00:00

c_serial 1000 00:00:00 pr_normal cluster 1.000000 7 1

pr_normal 0 00:00:00 requeue 1.000000 3 3-00:00:00


Jobs are being preempted by lr_normal queued job but instead of being requeued they are cancelled.

[2015-07-15T14:56:17.888] job_signal 9 of running job 93 successful 0x8008

[2015-07-15T14:56:17.888] preempted job 93 had to be killed

[2015-07-15T14:56:17.940] completing job 93 status 15

How does slurm decide if REQUEUE will cancel or requeue a job and can a user specify to only do a requeue within sbatch or srun?


Thanks


Jackie


Reply via email to