We have an HPC where the average job length is measured in days, not hours. Users are careful to add checkpoints to their jobs but even in that case, preempting a job that is close to its walltime (max: 14 days) can be very disruptive. I checked what options preemption offers but none seem to protect jobs near their finishing line. PreemptExempTime ensures a minimum job runtime and GraceTime allows for a grace time period after the job has been selected for preemption. Is there anything I am missing to achieve what I want?
Thank you! -- slurm-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
