Hi,
I have set the configuration parameter JobRequeue to zero, so failed jobs should not automatically requeue and rerun: # scontrol show config|grep -i requeue JobRequeue = 0 # But still jobs are rerun: [2013-10-18T13:53:25.556] sched: Allocate JobId=4451116 NodeList=q4 #CPUs=8 [2013-10-18T16:39:08.952] Batch JobId=4451116 missing from node 0 [2013-10-18T16:39:08.952] completing job 4451116 [2013-10-18T16:39:08.952] Job 4451116 cancelled from interactive user [2013-10-18T16:39:08.957] Requeue JobId=4451116 due to node failure [2013-10-18T16:39:08.957] sched: job_complete for JobId=4451116 successful, exit code=4294967294 [2013-10-18T16:39:08.958] Node q4 unexpectedly rebooted [2013-10-20T07:02:30.080] sched: Allocate JobId=4451116 NodeList=q3 #CPUs=8 How can I stop this from happening? (Most times the "node failure" is because the job exceeded memory limits, and will do so also on next try.) Cheers, -- Lennart Karlsson, UPPMAX, Uppsala University, Sweden http://www.uppmax.uu.se
