Hi,

I have set the configuration parameter JobRequeue to zero, so failed
jobs should not automatically requeue and rerun:
# scontrol show config|grep -i requeue
JobRequeue              = 0
#

But still jobs are rerun:
[2013-10-18T13:53:25.556] sched: Allocate JobId=4451116 NodeList=q4 #CPUs=8
[2013-10-18T16:39:08.952] Batch JobId=4451116 missing from node 0
[2013-10-18T16:39:08.952] completing job 4451116
[2013-10-18T16:39:08.952] Job 4451116 cancelled from interactive user
[2013-10-18T16:39:08.957] Requeue JobId=4451116 due to node failure
[2013-10-18T16:39:08.957] sched: job_complete for JobId=4451116 successful, 
exit code=4294967294
[2013-10-18T16:39:08.958] Node q4 unexpectedly rebooted
[2013-10-20T07:02:30.080] sched: Allocate JobId=4451116 NodeList=q3 #CPUs=8


How can I stop this from happening? (Most times the "node failure" is
because the job exceeded memory limits, and will do so also on next try.)

Cheers,
-- Lennart Karlsson, UPPMAX, Uppsala University, Sweden
   http://www.uppmax.uu.se

Reply via email to