One user has recently started to see their jobs killed after roughly 40 minutes, even though they have asked for four hours.
40 minutes is partitions' default, but this user has #SBATCH --time=04:00:00 in their sbatch file? I have found this: https://bugs.schedmd.com/show_bug.cgi?id=2353 and we are using the affected 16.05.0 but I've not run scontrol reconfigure for a while, and we don't run nhc. I'm confused. This is from the slurm-ctld log [2017-05-22T16:51:53.577] _slurm_rpc_submit_batch_job JobId=723118 usec=303 [2017-05-22T16:51:54.271] sched: Allocate JobID=723118 NodeList=papr-res-compute01 #CPUs=1 Partition=prod [2017-05-22T16:51:58.252] _pick_step_nodes: Configuration for job 723118 is complete [2017-05-22T17:32:09.641] Time limit exhausted for JobId=723118 [2017-05-22T17:32:09.749] job_complete: JobID=723118 State=0x8006 NodeCnt=1 WTERMSIG 15 This is from the relevant node's slurmd.log [2017-05-22T16:51:54.289] _run_prolog: prolog with lock for job 723118 ran for 0 seconds [2017-05-22T16:51:54.309] Launching batch job 723118 for UID 1514 [2017-05-22T16:51:58.259] launch task 723118.0 request from 1514.1514@10.126.19.15 (port 11938) [2017-05-22T17:32:09.644] [723118] error: *** JOB 723118 ON papr-res-compute01 CANCELLED AT 2017-05-22T17:32:09 DUE TO TIME LIMIT *** [2017-05-22T17:32:09.644] [723118.0] error: *** STEP 723118.0 ON papr-res-compute01 CANCELLED AT 2017-05-22T17:32:09 DUE TO TIME LIMIT *** [2017-05-22T17:32:09.747] [723118] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 15 [2017-05-22T17:32:09.759] [723118] done with job [2017-05-22T17:32:09.793] [723118.0] done with job User has also run this sbatch with #SBATCH --time=0-04:00:00 to the same error. Any ideas where to look (the time on the cluster is managed, and was resync'd early last week) cheers L. ------ "Mission Statement: To provide hope and inspiration for collective action, to build collective power, to achieve collective transformation, rooted in grief and rage but pointed towards vision and dreams." - Patrice Cullors, *Black Lives Matter founder*