So what are the default values for these two options? We recently
updated to 14.11 and jobs that previously would have just requeued due
to node failure are now going into a held state.
*RequeueExit*
Enables automatic job requeue for jobs which exit with the specified
values. Separate multiple exit code by a comma. Jobs will be put
back in to pending state and later scheduled again. Restarted jobs
will have the environment variable *SLURM_RESTART_COUNT* set to the
number of times the job has been restarted.
*RequeueExitHold*
Enables automatic requeue of jobs into pending state in hold,
meaning their priority is zero. Separate multiple exit code by a
comma. These jobs are put in the *JOB_SPECIAL_EXIT* exit state.
Restarted jobs will have the environment variable
*SLURM_RESTART_COUNT* set to the number of times the job has been
restarted.
-Paul Edmon-