Configure SlurmdTimeout sufficiently large and you should be fine  
_except_ when the node running a batch script reboots that job will be  
killed.

Quoting "Jeff Squyres (jsquyres)" <[email protected]>:

>
> Is there a mode in SLURM where I can make it ok to reboot nodes during a job?
>
> Specifically, we want to use SLURM to manage a QA cluster here in  
> Cisco.  Some of the things that we need QA jobs to do is actually  
> reboot nodes -- but we don't want the SLURM job to end because the  
> job rebooted; the reboot was part of the job.  We want the node to  
> reboot and have SLURM say "oh, ok, you're back -- you can re-join  
> the job now."
>
> Is that possible?
>
> --
> Jeff Squyres
> [email protected]
> For corporate legal information go to:  
> http://www.cisco.com/web/about/doing_business/legal/cri/
>

Reply via email to