OK, feeling a bit silly about having sent this after re-re-reading the
man page for slurm.conf... and discovering the
AccountingStorageBackupHost setting.
Sorry for wasting the time of anyone who read that :)
Xand*
*
On 15/02/2022 15:46, Xand Meaden wrote:
Hello,
I'm wondering what others are doing to make their slurmdbd service
resilient? We have the following setup right now:
- two VMs running slurmctld (and also slurmdbd)
- shared storage for StateSaveLocation using CephFS
- three-way mysql cluster using Percona XtraDB
However I can see no "Slurm native" way to make slurmdbd resilient -
there is no option for a backup server in slurm.conf. I naively tried
setting the AccountingStorageHost to "localhost" but this only worked
on the primary control node.
Can we use something like Keepalived to present slurmdbd running on
both control nodes via a floating IP, or will this cause complications
with Slurm's use of it?
Thanks for any advice,
Xand
--
Xand Meaden
Senior Research Infrastructure Engineer
e-Research
King's College London