Danny,

I am doing this with batch jobs using sbatch. I also had JobRequeue=1 in
slurm.conf and we used the --requeue option and it still did not work.

So that is why we were asking if it is a configuration issue or a bug.

I can run debug level higher tomorrow and send you the output if you want
me to.

Thanks

Jackie


On Wed, Jul 15, 2015 at 3:54 PM, Danny Auble <[email protected]> wrote:

>
> Hey Jackie, only batch jobs can be requeued.  Otherwise they get
> canceled.  If you have level "debug" debugging on you would get a message
> like "Job-requeue can only be done for batch jobs" for non-batch jobs right
> before the "had to be killed" message.
>
> Are you seeing this for batch jobs as well?  If so make sure your
> slurm.conf doesn't have JobRequeue=0 or you will have to have each sbatch
> have the --requeue option to allow requeueing.
>
> Let me know if that helps or not
>
> On 07/15/15 15:32, Jacqueline Scoggins wrote:
>
>> requeue with preemption not working
>>
>> Can someone help assist with this behavior we are seeing?
>>
>> Slurm - 14.3.8
>> Linux - SL 6.6
>>
>> trying to setup preemption via qos
>> /etc/slurm/slurm.conf -
>>    PreemptMode=REQUEUE
>>    PreemptType=preempt/qos
>>
>>
>> qos settings are as follows:
>>
>>     Name   Priority  GraceTime Preempt PreemptMode Flags UsageThres
>> UsageFactor  GrpCPUs  GrpCPUMins GrpCPURunMins GrpJobs  GrpMem GrpNodes
>> GrpSubmit GrpWall  MaxCPUs  MaxCPUMins MaxNodes     MaxWall MaxCPUsPU
>> MaxJobsPU MaxNodesPU MaxSubmitPU
>>
>> ---------- ---------- ---------- ---------- -----------
>> ---------------------------------------- ---------- ----------- --------
>> ----------- ------------- ------- ------- -------- --------- -----------
>> -------- ----------- -------- ----------- --------- --------- ----------
>> -----------
>>
>>     normal          0   00:00:00             cluster
>>  1.000000
>>
>>   lr_debug      10000   00:00:00 pr_normal     cluster
>>  1.000000                     4    00:30:00
>>
>>  lr_normal       1000   00:00:00 pr_normal     cluster
>>  1.000000                     64  3-00:00:00
>>
>>   c_serial       1000   00:00:00 pr_normal     cluster
>>  1.000000                           7                     1
>>
>>  pr_normal          0   00:00:00             requeue
>>  1.000000                   3  3-00:00:00
>>
>>
>> Jobs are being preempted by lr_normal queued job but instead of being
>> requeued they are cancelled.
>>
>> [2015-07-15T14:56:17.888] job_signal 9 of running job 93 successful 0x8008
>>
>> [2015-07-15T14:56:17.888] preempted job 93 had to be killed
>>
>> [2015-07-15T14:56:17.940] completing job 93 status 15
>>
>> How does slurm decide if REQUEUE will cancel or requeue a job and can a
>> user specify to only do a requeue within sbatch or srun?
>>
>>
>> Thanks
>>
>>
>> Jackie
>>
>>
>>

Reply via email to