Cool -- we'll try this.

Thank you!


On Feb 6, 2013, at 1:36 PM, Moe Jette <[email protected]> wrote:

> 
> One more thing. Configure FastSchedule=2 to avoid having the node  
> marked DOWN due to an unexpected reboot.
> 
> Quoting Moe Jette <[email protected]>:
> 
>> 
>> Configure SlurmdTimeout sufficiently large and you should be fine
>> _except_ when the node running a batch script reboots that job will be
>> killed.
>> 
>> Quoting "Jeff Squyres (jsquyres)" <[email protected]>:
>> 
>>> 
>>> Is there a mode in SLURM where I can make it ok to reboot nodes  
>>> during a job?
>>> 
>>> Specifically, we want to use SLURM to manage a QA cluster here in
>>> Cisco.  Some of the things that we need QA jobs to do is actually
>>> reboot nodes -- but we don't want the SLURM job to end because the
>>> job rebooted; the reboot was part of the job.  We want the node to
>>> reboot and have SLURM say "oh, ok, you're back -- you can re-join
>>> the job now."
>>> 
>>> Is that possible?
>>> 
>>> --
>>> Jeff Squyres
>>> [email protected]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>> 
>> 
> 


-- 
Jeff Squyres
[email protected]
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to