On 06/20/2017 04:32 PM, Loris Bennett wrote:
We do our upgrades while full production is up and running.  We just stop
the Slurm daemons, dump the database and copy the statesave directory
just in case.  We then do the update, and finally restart the Slurm
daemons.  We only lost jobs once during an upgrade back around 2.2.6 or
so, but that was due a rather brittle configuration provided by our
vendor (the statesave path contained the Slurm version), rather than
Slurm itself and was before we had acquired any Slurm expertise
ourselves.

1. When you refer to "daemons", do you mean slurmctld, slurmdbd as well as slurmd on all compute nodes? AFAIK, the recommended procedure upgrading and restarting in this order: 1) slurmdbd, 2) slurmctld, 3) slurmd on nodes.

2. When you mention statesave, I suppose this is what you refer to:
# scontrol show config | grep -i statesave
StateSaveLocation       = /var/spool/slurmctld

Thanks,
Ole

Reply via email to