The [email protected] list will be retired March 1, 2012.  The 
schedmd.com domain now hosts its replacement.

http://www.schedmd.com/slurmdocs/mail.html

The new list is operational.  Please resubmit this message to 
[email protected]

The archive of the slurm-dev list will remain here:  
http://groups.google.com/group/slurm-devel.  Postings to the new list will be 
archived to the same place.

Hello everyone,

the subject says it all, now I tell you
how far I went and what piece of info I
miss.

###################
# hostnames & IPs #
###################

Pretty standard cluster:

+------+                     +-------+
| cyan +------+          +---+ node1 |
+------+      |          |   +-------+
              |          |
          +---+---+      |   +-------+
          | switch+------+---+ node2 +
          +---+---+      |   +-------+
              |          ~
+------+      |          |   +-------+
| blue +------+          +---| modeN |
+------+                     +-------+

hostname server A:        cyan
IP server A:              192.168.0.1

hostname server B:        blue
IP server B:              192.168.0.2

hostname floating server: pink
floating IP:              192.168.0.3

#########
# Mysql #
#########

I've replicated '/etc/mysql/debian.cnf' on both cyan and
blue:

### debian.cnf ###
[client]
host     = localhost
user     = debian-sys-maint
password = SysMaintPass
socket   = /var/run/mysqld/mysqld.sock
[mysql_upgrade]
host     = localhost
user     = debian-sys-maint
password = SysMaintPass
socket   = /var/run/mysqld/mysqld.sock
basedir  = /usr
##################

so to have the same SysMaintPass for
the debian-sys-maint user on both servers.

Moreover I've issued the following mysql command:

grant all on slurm_acct_db.* TO 'slurm'@'localhost' \
  identified by 'SlurmDBDPass' with grant option;

########
# DRBD #
########

drbd (active/passive) manages this folder (NFS):

/var/lib/mysql

(slurm database is in /var/lib/mysql/slurm_acct_db)

#############
# pacemaker #
#############

pacemaker keeps all the following services/servers
always running on the "active" server only:

floating IP
mysql file system
mysql server
slurmdbd
slurmctld

#################
# slurmdbd.conf #
#################

I've replicated '/etc/slurm/slurmdbd.conf' on both
cyan and blue servers:

### slurmdbd.conf ###
AuthType=auth/munge
DbdHost=localhost    <<<<<<--------- is this correct?
SlurmUser=slurm
StorageHost=localhost   <<<<<<------ is this correct?
StoragePass=SlurmDBDPass
StorageType=accounting_storage/mysql
StorageUser=slurm
StorageLoc=slurm_acct_db
...
#####################

I DID NOT define DbdBackupHost and StorageBackupHost
intentionally.

I believe that using 'localhost' for both DbdHost and
StorageHost is correct because both slurmdbd and the
mysql servers will always be running side by side either
on cyan or blue.

What I miss now is how to configure slurm.conf properly
(see QUESTIONS section below)

##############
# slurm.conf #
##############

I've replicated '/etc/slurm/slurm.conf' on cyan and blue
servers + ALL NODES.

### slurm.conf ###
ControlMachine=cyan
ControlAddr=cyan
AccountingStorageHost=cyan
AccountingStorageType=accounting_storage/slurmdbd
AuthType=auth/munge
CryptoType=crypto/munge
SlurmUser=slurm
SlurmdUser=root
StateSaveLocation=/var/run/slurm/slurmctld
...
##################

#############
# QUESTIONS #
#############

1)

  What we want is to have an identical/replicated
  slurm.conf file on all hosts right?

  Or should I have a different slurm.conf file on
  the nodes?

2)

  I believe I do not have to define the following
  keywords in slurm.conf:

  BackupController=
  BackupAddr=
  AccountingStorageBackupHost=

  because the HA I want is obtained by 'moving' all
  the services I need to the failover server (do not
  make use of the embedded HA in slurm).

  Is that correct?

3)

  Assuming point (2) is correct, should I apply the
  following changes to slurm.conf?

  ControlMachine=pink
  ControlAddr=192.168.0.3
  AccountingStorageHost=pink

  Or, how would I have to get them set?

4)

  Shall I also add to my DBD/hearbeat/pacemaker
  configuration the management of the following
  folder:

  /var/run/slurm

  which is specified for 'StateSaveLocation'?


Thanks for your input.

--matt

Reply via email to