Re: [slurm-users] Slurmdbd High Availability
Thanks ole for your input. I'm looking for the best fit solution so have a quick question related to slurmctld backup as well. I tested the read write speed on our NAS storage and local HDD, turns out the speed on local HDD is much higher than NAS storage. The r/w speed on NAS Storage is 250mb/s and on local HDD it's about 800-900mb/s. 1. I have a Storage NAS flashbox with r/w speed around 300-400 MB/s so wanted to know if this will suffice the requirement for setting up the slurmctld backup.Are there going to be any issue or impact? 2. Is it fine to implement it on NAS Storage? 3. What will be the prerequisite of setting up the slurmctld backup? Looking forward to hearing from you, Thanks, Shaghuf Rahman
Re: [slurm-users] Slurmdbd High Availability
Hi, Thanks everyone who shared the information with me. Really appreciate it. Thanks, Shaghuf Rahman On Sun, 16 Apr 2023 at 02:21, Daniel Letai wrote: > My go to solution is setting up Galera cluster using 2 slurmdbd servers > (each pointing to it's local db) and a 3rd quorum server. It's fairly easy > to setup and doesn't rely on block level duplication, HA semantics or > shared storage. > > > Just my 2 cents > > > On 14/04/2023 14:18, Tina Friedrich wrote: > > Or run your database server on something like VMWare ESXi (which is what > we do). Instant HA and I don't even need multiple servers for it :) > > I don't mean to be flippant, and I realise it's not addressing the mysql > HA question (but that got answered). However, a lot of us will have some > sort of failure-and-load-balancing VM estate anyway, or not? Using that > does - at least in my mind - solve the same problem (just via a slightly > different route). > > Other than that I'd agree that HA solutions - of the pacemaker & mirrored > block devices sort - tend to make things less reliable instead of more. > > Tina > > On 13/04/2023 16:03, Brian Andrus wrote: > > I think you mean both slurmctld servers are pointing the one slurmdbd > server. > > Ole is right about the usefulness of HA, especially on slurmdbd, as slurm > will cache the writes to the database if it is down. > > To do what you want, you need to look at configuring your database to be > HA. That is a different topic and would be dictated by what database setup > you are using. Understand the the backend database is a tool used by slurm > and not part of slurm. So any HA in that are needs to be done by the > database. > > Once that is done, merely have 2 separate slurmdbd servers, each pointing > at the HA database. One would be primary and the other a failover > (AccountingStorageBackupHost). Although, technically, they would both be > able to be active at the same time. > > Brian Andrus > > On 4/13/2023 2:49 AM, Shaghuf Rahman wrote: > > Hi, > > I am setting up Slurmdb in my system and I need some inputs > > My current setup is like > server1 : 192.168.123.12(slurmctld) > server2: 192.168.123.13(Slurmctld) > server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and > Server2. > database: MySQL > > I have 1 more server named as server 4: 192.168.123.15 which I need to > make it as a secondary database server. I want to configure this server4 > which will sync the database and make it either Active-Active slurmdbd or > Active-Passive. > > Could anyone please help me with the *steps* how to configure and also how > am i going to *sync* my *database* on both the servers simultaneously. > > Thanks & Regards, > Shaghuf Rahman > > >
Re: [slurm-users] Slurmdbd High Availability
My go to solution is setting up Galera cluster using 2 slurmdbd servers (each pointing to it's local db) and a 3rd quorum server. It's fairly easy to setup and doesn't rely on block level duplication, HA semantics or shared storage. Just my 2 cents On 14/04/2023 14:18, Tina Friedrich wrote: Or run your database server on something like VMWare ESXi (which is what we do). Instant HA and I don't even need multiple servers for it :) I don't mean to be flippant, and I realise it's not addressing the mysql HA question (but that got answered). However, a lot of us will have some sort of failure-and-load-balancing VM estate anyway, or not? Using that does - at least in my mind - solve the same problem (just via a slightly different route). Other than that I'd agree that HA solutions - of the pacemaker & mirrored block devices sort - tend to make things less reliable instead of more. Tina On 13/04/2023 16:03, Brian Andrus wrote: I think you mean both slurmctld servers are pointing the one slurmdbd server. Ole is right about the usefulness of HA, especially on slurmdbd, as slurm will cache the writes to the database if it is down. To do what you want, you need to look at configuring your database to be HA. That is a different topic and would be dictated by what database setup you are using. Understand the the backend database is a tool used by slurm and not part of slurm. So any HA in that are needs to be done by the database. Once that is done, merely have 2 separate slurmdbd servers, each pointing at the HA database. One would be primary and the other a failover (AccountingStorageBackupHost). Although, technically, they would both be able to be active at the same time. Brian Andrus On 4/13/2023 2:49 AM, Shaghuf Rahman wrote: Hi, I am setting up Slurmdb in my system and I need some inputs My current setup is like server1 : 192.168.123.12(slurmctld) server2: 192.168.123.13(Slurmctld) server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and Server2. database: MySQL I have 1 more server named as server 4: 192.168.123.15 which I need to make it as a secondary database server. I want to configure this server4 which will sync the database and make it either Active-Active slurmdbd or Active-Passive. Could anyone please help me with the *steps* how to configure and also how am i going to *sync* my *database* on both the servers simultaneously. Thanks & Regards, Shaghuf Rahman
Re: [slurm-users] Slurmdbd High Availability
Or run your database server on something like VMWare ESXi (which is what we do). Instant HA and I don't even need multiple servers for it :) I don't mean to be flippant, and I realise it's not addressing the mysql HA question (but that got answered). However, a lot of us will have some sort of failure-and-load-balancing VM estate anyway, or not? Using that does - at least in my mind - solve the same problem (just via a slightly different route). Other than that I'd agree that HA solutions - of the pacemaker & mirrored block devices sort - tend to make things less reliable instead of more. Tina On 13/04/2023 16:03, Brian Andrus wrote: I think you mean both slurmctld servers are pointing the one slurmdbd server. Ole is right about the usefulness of HA, especially on slurmdbd, as slurm will cache the writes to the database if it is down. To do what you want, you need to look at configuring your database to be HA. That is a different topic and would be dictated by what database setup you are using. Understand the the backend database is a tool used by slurm and not part of slurm. So any HA in that are needs to be done by the database. Once that is done, merely have 2 separate slurmdbd servers, each pointing at the HA database. One would be primary and the other a failover (AccountingStorageBackupHost). Although, technically, they would both be able to be active at the same time. Brian Andrus On 4/13/2023 2:49 AM, Shaghuf Rahman wrote: Hi, I am setting up Slurmdb in my system and I need some inputs My current setup is like server1 : 192.168.123.12(slurmctld) server2: 192.168.123.13(Slurmctld) server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and Server2. database: MySQL I have 1 more server named as server 4: 192.168.123.15 which I need to make it as a secondary database server. I want to configure this server4 which will sync the database and make it either Active-Active slurmdbd or Active-Passive. Could anyone please help me with the *steps* how to configure and also how am i going to *sync* my *database* on both the servers simultaneously. Thanks & Regards, Shaghuf Rahman
Re: [slurm-users] Slurmdbd High Availability
I think you mean both slurmctld servers are pointing the one slurmdbd server. Ole is right about the usefulness of HA, especially on slurmdbd, as slurm will cache the writes to the database if it is down. To do what you want, you need to look at configuring your database to be HA. That is a different topic and would be dictated by what database setup you are using. Understand the the backend database is a tool used by slurm and not part of slurm. So any HA in that are needs to be done by the database. Once that is done, merely have 2 separate slurmdbd servers, each pointing at the HA database. One would be primary and the other a failover (AccountingStorageBackupHost). Although, technically, they would both be able to be active at the same time. Brian Andrus On 4/13/2023 2:49 AM, Shaghuf Rahman wrote: Hi, I am setting up Slurmdb in my system and I need some inputs My current setup is like server1 : 192.168.123.12(slurmctld) server2: 192.168.123.13(Slurmctld) server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and Server2. database: MySQL I have 1 more server named as server 4: 192.168.123.15 which I need to make it as a secondary database server. I want to configure this server4 which will sync the database and make it either Active-Active slurmdbd or Active-Passive. Could anyone please help me with the *steps* how to configure and also how am i going to *sync* my *database* on both the servers simultaneously. Thanks & Regards, Shaghuf Rahman
Re: [slurm-users] Slurmdbd High Availability
On 4/13/23 11:49, Shaghuf Rahman wrote: I am setting up Slurmdb in my system and I need some inputs My current setup is like server1 : 192.168.123.12(slurmctld) server2: 192.168.123.13(Slurmctld) server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and Server2. database: MySQL I have 1 more server named as server 4: 192.168.123.15 which I need to make it as a secondary database server. I want to configure this server4 which will sync the database and make it either Active-Active slurmdbd or Active-Passive. Could anyone please help me with the *steps* how to configure and also how am i going to *sync* my *database* on both the servers simultaneously. Slurm administrators have different opinions about the usefulness versus complexity of HA setups. You could read SchedMD's presentation from page 38 and onwards: https://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf Some noteworthy slides state: Separating slurmctld and slurmdbd in normal production use is recommended. Master/backup slurmctld is common, and - as long as the performance for StateSaveLocation is kept high - not that difficult to implement. For slurmdbd, the critical element in the failure domain is MySQL, not slurmdbd. slurmdbd itself is stateless. IMNSHO, the additional complexity of a redundant MySQL deployment is more likely to cause an outage than it is to prevent one. So don’t bother setting up a redundant slurmdbd, keep slurmdbd + MySQL local to a single server. I hope this helps. /Ole
[slurm-users] Slurmdbd High Availability
Hi, I am setting up Slurmdb in my system and I need some inputs My current setup is like server1 : 192.168.123.12(slurmctld) server2: 192.168.123.13(Slurmctld) server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and Server2. database: MySQL I have 1 more server named as server 4: 192.168.123.15 which I need to make it as a secondary database server. I want to configure this server4 which will sync the database and make it either Active-Active slurmdbd or Active-Passive. Could anyone please help me with the *steps* how to configure and also how am i going to *sync* my *database* on both the servers simultaneously. Thanks & Regards, Shaghuf Rahman