I've looked at the code and it is somewhat different from what I  
thought. If the  PrologSlurmctld fails then batch jobs get requeued.  
Interactive jobs (salloc and srun) will be killed.
diff --git a/doc/man/man5/slurm.conf.5 b/doc/man/man5/slurm.conf.5
index f45e483..96016bd 100644
--- a/doc/man/man5/slurm.conf.5
+++ b/doc/man/man5/slurm.conf.5
@@ -1229,7 +1229,10 @@ also be used to specify more than one program  
to run (e.g.
  the first job step.  The prolog script or scripts may be used to purge files,
  enable user login, etc.  By default there is no prolog. Any configured script
  is expected to complete execution quickly (in less time than
-\fBMessageTimeout\fR).  See \fBProlog and Epilog Scripts\fR for more  
information.
+\fBMessageTimeout\fR).
+If the prolog fails (returns a non\-zero exit code), this will result in the
+node being set to a DOWN state and the job requeued to executed on  
another node.
+See \fBProlog and Epilog Scripts\fR for more information.

  .TP
  \fBPrologSlurmctld\fR
@@ -1250,7 +1253,7 @@ If some node can not be made available for use,  
the program should drain
  the node (typically using the scontrol command) and terminate with a  
non\-zero
  exit code.
  A non\-zero exit code will result in the job being requeued (where possible)
-or killed.
+or killed. Note that only batch jobs can be requeued.
  See \fBProlog and Epilog Scripts\fR for more information.

  .TP



Quoting Alessandro Italiano <[email protected]>:

>
> Hi
>
> we are going to evaluate slurm as batch system for our computing
> farm[14k computing slots].
>
> I've done some tests using the prolog script and I've noticed that
>
> 1. when the "Prolog" script fails the host, where it failed, is flagged
> as DOWN
>      and the job will stack in PENDING status.
> 2. when the "PrologSlurmctld" script fails the job is CANCELLED.
>
>
> first of all, can someone confirm that this is the expected behavior ?
>
> Is there a way to configure slurm in order to automatically dispatch a
> job on
> a new host when the "Prolog " script fails ?
>
> unfortunately I didn't find any answer to my questions in the "Prolog
> and Epilog Scripts" section of the slurm.conf man page
>
> thanks in advance
>
> Alessandro
>

Reply via email to