I'm curious what other sites do to keep jobs running in a reservation when
one of the nodes has an error. Obviously if it's an easy fix, then you
simply fix the node and the reservation can continue to run jobs. Also, if
nodes are available, you may add one to the reservation to make up for the
slack caused by the bad one. One can also make the reservation larger by a
few nodes to account for bad luck.

I'm really wondering if there are any better options or any automated
options. 

What do others do?

Thanks,
Bill.

-- 
Bill Barth, Ph.D., Director, HPC
[email protected]        |   Phone: (512) 232-7069
Office: ROC 1.435             |   Fax:   (512) 475-9445



Reply via email to