Re: [slurm-users] Reservation vs. Draining for Maintenance?

Paul Edmon Thu, 06 Aug 2020 10:54:20 -0700

Because we want to maximize usage we actually have opted to just cancelall running jobs the day of. We send out notification to all the usersthat this will happen. We haven't really seen any complaints and we'vebeen doing this for years. At the start of the outage we set allpartitions to down, then run a cancel over all the running jobs. Pending jobs are left in place, and users are allowed to submit workduring the outage and when we reopen everything gets going again.

So there is a third option, though you have to accept that jobs will becancelled to pull it off.


-Paul Edmon-

On 8/6/2020 1:13 PM, Jason Simms wrote:

Hello all,
Later this month, I will have to bring down, patch, and reboot allnodes in our cluster for maintenance. The two options available to setnodes into a maintenance mode seem to be either: 1) creating asystem-wide reservation, or 2) setting all nodes into a DRAIN state.
I'm not sure it really matters either way, but is there any preferenceone way or the other? Any gotchas I should be aware of?
Warmest regards,
Jason

--
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632

Re: [slurm-users] Reservation vs. Draining for Maintenance?

Reply via email to