Hi, One of our users had a job fail with the following error:
slurm-outfile: srun: error: Unable to create job step: Requested node configuration is not available Grepping the job ID out of the slurmcltd log file gives: [2015-08-31T14:00:27.654] _slurm_rpc_submit_batch_job JobId=12345 usec=685 [2015-08-31T14:00:42.812] backfill: Started JobId=12345 on node[009,021,041] [2015-08-31T14:01:02.361] _pick_step_nodes: requested nodes node[009,017,041] not part of job 12345 [2015-08-31T14:37:44.625] _job_signal: enter job JobID=12345 State=0x1 NodeCnt=3 [2015-08-31T14:37:44.626] deallocate_nodes: job JobID=12345 State=0x8004 NodeCnt=3 Can anyone explain how the nodes the requested nodes can be different to the ones the job was started with? This occurs with Slurm 14.11.8. Cheers, Loris -- This signature is currently under construction.
