Hey folks This is likely a dumb question, so I appreciate your patience in advance. I need to schedule a job that takes a node down, flashes the firmware, and reboots it. I can obviously ask SLURM to allocate two nodes for me, and run my job script on the node I don’t intend to service. However, once the script starts executing, SLURM is going to see the target service node “fail”.
Is there some option I can use to tell SLURM “ignore node failures when executing this job”? Ralph