So what are the default values for these two options? We recently updated to 14.11 and jobs that previously would have just requeued due to node failure are now going into a held state.

*RequeueExit*
   Enables automatic job requeue for jobs which exit with the specified
   values. Separate multiple exit code by a comma. Jobs will be put
   back in to pending state and later scheduled again. Restarted jobs
   will have the environment variable *SLURM_RESTART_COUNT* set to the
   number of times the job has been restarted.

*RequeueExitHold*
   Enables automatic requeue of jobs into pending state in hold,
   meaning their priority is zero. Separate multiple exit code by a
   comma. These jobs are put in the *JOB_SPECIAL_EXIT* exit state.
   Restarted jobs will have the environment variable
   *SLURM_RESTART_COUNT* set to the number of times the job has been
restarted.
-Paul Edmon-

Reply via email to