Re: [slurm-users] Slurmdbd High Availability

2023-05-17 Thread Shaghuf Rahman
Thanks ole for your input.

I'm looking for the best fit solution so have a quick question related to
slurmctld backup as well.

I tested the read write speed on our NAS storage and local HDD, turns out
the speed on local HDD is much higher than NAS storage. The r/w speed on
NAS Storage is 250mb/s and on local HDD it's about 800-900mb/s.

1. I have a Storage NAS flashbox with r/w speed around 300-400 MB/s so
wanted to know if this will suffice the requirement for setting up the
slurmctld backup.Are there going to be any issue or impact?
2. Is it fine to implement it on NAS Storage?
3. What will be the prerequisite of setting up the slurmctld backup?

Looking forward to hearing from you,

Thanks,
Shaghuf Rahman


Re: [slurm-users] Slurmdbd High Availability

2023-04-17 Thread Shaghuf Rahman
Hi,

Thanks everyone who shared the information with me.
Really appreciate it.

Thanks,
Shaghuf Rahman

On Sun, 16 Apr 2023 at 02:21, Daniel Letai  wrote:

> My go to solution is setting up Galera cluster using 2 slurmdbd servers
> (each pointing to it's local db) and a 3rd quorum server. It's fairly easy
> to setup and doesn't rely on block level duplication, HA semantics or
> shared storage.
>
>
> Just my 2 cents
>
>
> On 14/04/2023 14:18, Tina Friedrich wrote:
>
> Or run your database server on something like VMWare ESXi (which is what
> we do). Instant HA and I don't even need multiple servers for it :)
>
> I don't mean to be flippant, and I realise it's not addressing the mysql
> HA question (but that got answered). However, a lot of us will have some
> sort of failure-and-load-balancing VM estate anyway, or not? Using that
> does - at least in my mind - solve the same problem (just via a slightly
> different route).
>
> Other than that I'd agree that HA solutions - of the pacemaker & mirrored
> block devices sort - tend to make things less reliable instead of more.
>
> Tina
>
> On 13/04/2023 16:03, Brian Andrus wrote:
>
> I think you mean both slurmctld servers are pointing the one slurmdbd
> server.
>
> Ole is right about the usefulness of HA, especially on slurmdbd, as slurm
> will cache the writes to the database if it is down.
>
> To do what you want, you need to look at configuring your database to be
> HA. That is a different topic and would be dictated by what database setup
> you are using. Understand the the backend database is a tool used by slurm
> and not part of slurm. So any HA in that are needs to be done by the
> database.
>
> Once that is done, merely have 2 separate slurmdbd servers, each pointing
> at the HA database. One would be primary and the other a failover
> (AccountingStorageBackupHost). Although, technically, they would both be
> able to be active at the same time.
>
> Brian Andrus
>
> On 4/13/2023 2:49 AM, Shaghuf Rahman wrote:
>
> Hi,
>
> I am setting up Slurmdb in my system and I need some inputs
>
> My current setup is like
> server1 : 192.168.123.12(slurmctld)
> server2: 192.168.123.13(Slurmctld)
> server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and
> Server2.
> database: MySQL
>
> I have 1 more server named as server 4: 192.168.123.15 which I need to
> make it as a secondary database server. I want to configure this server4
> which will sync the database and make it either Active-Active slurmdbd or
> Active-Passive.
>
> Could anyone please help me with the *steps* how to configure and also how
> am i going to *sync* my *database* on both the servers simultaneously.
>
> Thanks & Regards,
> Shaghuf Rahman
>
>
>


Re: [slurm-users] Slurmdbd High Availability

2023-04-15 Thread Daniel Letai

  
  
My go to solution is setting up Galera cluster using 2 slurmdbd
  servers (each pointing to it's local db) and a 3rd quorum server.
  It's fairly easy to setup and doesn't rely on block level
  duplication, HA semantics or shared storage.


Just my 2 cents



On 14/04/2023 14:18, Tina Friedrich
  wrote:

Or run
  your database server on something like VMWare ESXi (which is what
  we do). Instant HA and I don't even need multiple servers for it
  :)
  
  
  I don't mean to be flippant, and I realise it's not addressing the
  mysql HA question (but that got answered). However, a lot of us
  will have some sort of failure-and-load-balancing VM estate
  anyway, or not? Using that does - at least in my mind - solve the
  same problem (just via a slightly different route).
  
  
  Other than that I'd agree that HA solutions - of the pacemaker
  & mirrored block devices sort - tend to make things less
  reliable instead of more.
  
  
  Tina
  
  
  On 13/04/2023 16:03, Brian Andrus wrote:
  
  I think you mean both slurmctld servers
are pointing the one slurmdbd server.


Ole is right about the usefulness of HA, especially on slurmdbd,
as slurm will cache the writes to the database if it is down.


To do what you want, you need to look at configuring your
database to be HA. That is a different topic and would be
dictated by what database setup you are using. Understand the
the backend database is a tool used by slurm and not part of
slurm. So any HA in that are needs to be done by the database.


Once that is done, merely have 2 separate slurmdbd servers, each
pointing at the HA database. One would be primary and the other
a failover (AccountingStorageBackupHost). Although, technically,
they would both be able to be active at the same time.


Brian Andrus


On 4/13/2023 2:49 AM, Shaghuf Rahman wrote:

Hi,
  
  
  I am setting up Slurmdb in my system and I need some inputs
  
  
  My current setup is like
  
  server1 : 192.168.123.12(slurmctld)
  
  server2: 192.168.123.13(Slurmctld)
  
  server3: 192.168.123.14(Slurmdbd) which is pointing to both
  Server1 and Server2.
  
  database: MySQL
  
  
  I have 1 more server named as server 4: 192.168.123.15 which I
  need to make it as a secondary database server. I want to
  configure this server4 which will sync the database and make
  it either Active-Active slurmdbd or Active-Passive.
  
  
  Could anyone please help me with the *steps* how to configure
  and also how am i going to *sync* my *database* on both the
  servers simultaneously.
  
  
  Thanks & Regards,
  
  Shaghuf Rahman
  
  

  
  

  




Re: [slurm-users] Slurmdbd High Availability

2023-04-14 Thread Tina Friedrich
Or run your database server on something like VMWare ESXi (which is what 
we do). Instant HA and I don't even need multiple servers for it :)


I don't mean to be flippant, and I realise it's not addressing the mysql 
HA question (but that got answered). However, a lot of us will have some 
sort of failure-and-load-balancing VM estate anyway, or not? Using that 
does - at least in my mind - solve the same problem (just via a slightly 
different route).


Other than that I'd agree that HA solutions - of the pacemaker & 
mirrored block devices sort - tend to make things less reliable instead 
of more.


Tina

On 13/04/2023 16:03, Brian Andrus wrote:
I think you mean both slurmctld servers are pointing the one slurmdbd 
server.


Ole is right about the usefulness of HA, especially on slurmdbd, as 
slurm will cache the writes to the database if it is down.


To do what you want, you need to look at configuring your database to be 
HA. That is a different topic and would be dictated by what database 
setup you are using. Understand the the backend database is a tool used 
by slurm and not part of slurm. So any HA in that are needs to be done 
by the database.


Once that is done, merely have 2 separate slurmdbd servers, each 
pointing at the HA database. One would be primary and the other a 
failover (AccountingStorageBackupHost). Although, technically, they 
would both be able to be active at the same time.


Brian Andrus

On 4/13/2023 2:49 AM, Shaghuf Rahman wrote:

Hi,

I am setting up Slurmdb in my system and I need some inputs

My current setup is like
server1 : 192.168.123.12(slurmctld)
server2: 192.168.123.13(Slurmctld)
server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 
and Server2.

database: MySQL

I have 1 more server named as server 4: 192.168.123.15 which I need to 
make it as a secondary database server. I want to configure this 
server4 which will sync the database and make it either Active-Active 
slurmdbd or Active-Passive.


Could anyone please help me with the *steps* how to configure and also 
how am i going to *sync* my *database* on both the servers simultaneously.


Thanks & Regards,
Shaghuf Rahman





Re: [slurm-users] Slurmdbd High Availability

2023-04-13 Thread Brian Andrus
I think you mean both slurmctld servers are pointing the one slurmdbd 
server.


Ole is right about the usefulness of HA, especially on slurmdbd, as 
slurm will cache the writes to the database if it is down.


To do what you want, you need to look at configuring your database to be 
HA. That is a different topic and would be dictated by what database 
setup you are using. Understand the the backend database is a tool used 
by slurm and not part of slurm. So any HA in that are needs to be done 
by the database.


Once that is done, merely have 2 separate slurmdbd servers, each 
pointing at the HA database. One would be primary and the other a 
failover (AccountingStorageBackupHost). Although, technically, they 
would both be able to be active at the same time.


Brian Andrus

On 4/13/2023 2:49 AM, Shaghuf Rahman wrote:

Hi,

I am setting up Slurmdb in my system and I need some inputs

My current setup is like
server1 : 192.168.123.12(slurmctld)
server2: 192.168.123.13(Slurmctld)
server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 
and Server2.

database: MySQL

I have 1 more server named as server 4: 192.168.123.15 which I need to 
make it as a secondary database server. I want to configure this 
server4 which will sync the database and make it either Active-Active 
slurmdbd or Active-Passive.


Could anyone please help me with the *steps* how to configure and also 
how am i going to *sync* my *database* on both the servers simultaneously.


Thanks & Regards,
Shaghuf Rahman


Re: [slurm-users] Slurmdbd High Availability

2023-04-13 Thread Ole Holm Nielsen

On 4/13/23 11:49, Shaghuf Rahman wrote:

I am setting up Slurmdb in my system and I need some inputs

My current setup is like
server1 : 192.168.123.12(slurmctld)
server2: 192.168.123.13(Slurmctld)
server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and 
Server2.

database: MySQL

I have 1 more server named as server 4: 192.168.123.15 which I need to 
make it as a secondary database server. I want to configure this server4 
which will sync the database and make it either Active-Active slurmdbd or 
Active-Passive.


Could anyone please help me with the *steps* how to configure and also how 
am i going to *sync* my *database* on both the servers simultaneously.


Slurm administrators have different opinions about the usefulness versus 
complexity of HA setups.  You could read SchedMD's presentation from page 
38 and onwards: https://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf


Some noteworthy slides state:


Separating slurmctld and slurmdbd in normal production use
is recommended.
Master/backup slurmctld is common, and - as long as the
performance for StateSaveLocation is kept high - not that
difficult to implement.



For slurmdbd, the critical element in the failure domain is
MySQL, not slurmdbd. slurmdbd itself is stateless.



IMNSHO, the additional complexity of a redundant MySQL
deployment is more likely to cause an outage than it is to
prevent one.
So don’t bother setting up a redundant slurmdbd, keep
slurmdbd + MySQL local to a single server.


I hope this helps.

/Ole



[slurm-users] Slurmdbd High Availability

2023-04-13 Thread Shaghuf Rahman
Hi,

I am setting up Slurmdb in my system and I need some inputs

My current setup is like
server1 : 192.168.123.12(slurmctld)
server2: 192.168.123.13(Slurmctld)
server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and
Server2.
database: MySQL

I have 1 more server named as server 4: 192.168.123.15 which I need to make
it as a secondary database server. I want to configure this server4
which will sync the database and make it either Active-Active slurmdbd or
Active-Passive.

Could anyone please help me with the *steps* how to configure and also how
am i going to *sync* my *database* on both the servers simultaneously.

Thanks & Regards,
Shaghuf Rahman