On 06/20/2017 04:32 PM, Loris Bennett wrote:
We do our upgrades while full production is up and running. We just stop the Slurm daemons, dump the database and copy the statesave directory just in case. We then do the update, and finally restart the Slurm daemons. We only lost jobs once during an upgrade back around 2.2.6 or so, but that was due a rather brittle configuration provided by our vendor (the statesave path contained the Slurm version), rather than Slurm itself and was before we had acquired any Slurm expertise ourselves.
1. When you refer to "daemons", do you mean slurmctld, slurmdbd as well as slurmd on all compute nodes? AFAIK, the recommended procedure upgrading and restarting in this order: 1) slurmdbd, 2) slurmctld, 3) slurmd on nodes.
2. When you mention statesave, I suppose this is what you refer to: # scontrol show config | grep -i statesave StateSaveLocation = /var/spool/slurmctld Thanks, Ole
