Danny, I am doing this with batch jobs using sbatch. I also had JobRequeue=1 in slurm.conf and we used the --requeue option and it still did not work.
So that is why we were asking if it is a configuration issue or a bug. I can run debug level higher tomorrow and send you the output if you want me to. Thanks Jackie On Wed, Jul 15, 2015 at 3:54 PM, Danny Auble <[email protected]> wrote: > > Hey Jackie, only batch jobs can be requeued. Otherwise they get > canceled. If you have level "debug" debugging on you would get a message > like "Job-requeue can only be done for batch jobs" for non-batch jobs right > before the "had to be killed" message. > > Are you seeing this for batch jobs as well? If so make sure your > slurm.conf doesn't have JobRequeue=0 or you will have to have each sbatch > have the --requeue option to allow requeueing. > > Let me know if that helps or not > > On 07/15/15 15:32, Jacqueline Scoggins wrote: > >> requeue with preemption not working >> >> Can someone help assist with this behavior we are seeing? >> >> Slurm - 14.3.8 >> Linux - SL 6.6 >> >> trying to setup preemption via qos >> /etc/slurm/slurm.conf - >> PreemptMode=REQUEUE >> PreemptType=preempt/qos >> >> >> qos settings are as follows: >> >> Name Priority GraceTime Preempt PreemptMode Flags UsageThres >> UsageFactor GrpCPUs GrpCPUMins GrpCPURunMins GrpJobs GrpMem GrpNodes >> GrpSubmit GrpWall MaxCPUs MaxCPUMins MaxNodes MaxWall MaxCPUsPU >> MaxJobsPU MaxNodesPU MaxSubmitPU >> >> ---------- ---------- ---------- ---------- ----------- >> ---------------------------------------- ---------- ----------- -------- >> ----------- ------------- ------- ------- -------- --------- ----------- >> -------- ----------- -------- ----------- --------- --------- ---------- >> ----------- >> >> normal 0 00:00:00 cluster >> 1.000000 >> >> lr_debug 10000 00:00:00 pr_normal cluster >> 1.000000 4 00:30:00 >> >> lr_normal 1000 00:00:00 pr_normal cluster >> 1.000000 64 3-00:00:00 >> >> c_serial 1000 00:00:00 pr_normal cluster >> 1.000000 7 1 >> >> pr_normal 0 00:00:00 requeue >> 1.000000 3 3-00:00:00 >> >> >> Jobs are being preempted by lr_normal queued job but instead of being >> requeued they are cancelled. >> >> [2015-07-15T14:56:17.888] job_signal 9 of running job 93 successful 0x8008 >> >> [2015-07-15T14:56:17.888] preempted job 93 had to be killed >> >> [2015-07-15T14:56:17.940] completing job 93 status 15 >> >> How does slurm decide if REQUEUE will cancel or requeue a job and can a >> user specify to only do a requeue within sbatch or srun? >> >> >> Thanks >> >> >> Jackie >> >> >>
