Re: [slurm-users] How to run one maintenance job on each node in the cluster

2023-12-23 Thread Gerhard Strangar
Jeffrey Tunison wrote:
> Is there a straightforward way to create a batch job that runs once on every 
> node in the cluster?

A wrapper around reboot configured as RebootProgram in slurm.conf?



Re: [slurm-users] How to run one maintenance job on each node in the cluster

2023-12-23 Thread Ole Holm Nielsen

On 23-12-2023 05:09, Jeffrey Tunison wrote:
Is there a straightforward way to create a batch job that runs once on 
every node in the cluster?


A technique simpler than generating a list from sinfo output and 
dispatching the job in a for loop for the N nodes.


That’s not very hard, but I thought there might be an elegant solution 
which would make dispatching maintenance jobs easier.


One solution is the method in this script:
https://github.com/OleHolmNielsen/Slurm_tools/blob/master/nodes/update.sh

This works very reliably for us when we need to apply OS or firmware 
updates.



SLURM 22.05.09


Note: You should apply the recent Slurm security updates ASAP!

/Ole



[slurm-users] How to run one maintenance job on each node in the cluster

2023-12-22 Thread Jeffrey Tunison
Is there a straightforward way to create a batch job that runs once on every 
node in the cluster?

A technique simpler than generating a list from sinfo output and dispatching 
the job in a for loop for the N nodes.
That’s not very hard, but I thought there might be an elegant solution which 
would make dispatching maintenance jobs easier.

SLURM 22.05.09

Thanks,
Jeffrey