Re: [slurm-users] HA for slurmdbd

Brian Andrus Tue, 15 Feb 2022 10:19:22 -0800

There hasn't been as much effort to make slurmdbd as resilient as youare hinting at because there has been no need.

The database itself can be made resilient for keeping the data safe.Data that is unable to go in to the database is cached until it becomesavailable, even if that is to failover to theAccountingStorageBackupHost. So the only potential 'loss' is access toimmediate data that may be in a cache until a slurmdbd server isaccessible again.

You can have multiple slurmdbd servers running and point any system towhichever you like. In that respect, a simple way to do it would be tohave round-robin DNS or a load balancer in front of the slurmdbd serversand let that be where clients access it.


Brian Andrus

On 2/15/2022 7:46 AM, Xand Meaden wrote:

Hello,
I'm wondering what others are doing to make their slurmdbd serviceresilient? We have the following setup right now:
- two VMs running slurmctld (and also slurmdbd)
- shared storage for StateSaveLocation using CephFS
- three-way mysql cluster using Percona XtraDB
However I can see no "Slurm native" way to make slurmdbd resilient -there is no option for a backup server in slurm.conf. I naively triedsetting the AccountingStorageHost to "localhost" but this only workedon the primary control node.
Can we use something like Keepalived to present slurmdbd running onboth control nodes via a floating IP, or will this cause complicationswith Slurm's use of it?
Thanks for any advice,
Xand

Re: [slurm-users] HA for slurmdbd

Reply via email to