The slurm-dev@lists.llnl.gov list will be retired March 1, 2012. The
schedmd.com domain now hosts its replacement.
http://www.schedmd.com/slurmdocs/mail.html
The new list is operational. Please resubmit this message to
slurm-...@schedmd.com
The archive of the slurm-dev list will remain here:
http://groups.google.com/group/slurm-devel. Postings to the new list will be
archived to the same place.
Answers are inline below.
Quoting Matteo Guglielmi <matteo.guglie...@epfl.ch>:
The slurm-dev@lists.llnl.gov list will be retired March 1, 2012.
The schedmd.com domain now hosts its replacement.
http://www.schedmd.com/slurmdocs/mail.html
The new list is operational. Please resubmit this message to
slurm-...@schedmd.com
The archive of the slurm-dev list will remain here:
http://groups.google.com/group/slurm-devel. Postings to the new
list will be archived to the same place.
Hello everyone,
the subject says it all, now I tell you
how far I went and what piece of info I
miss.
###################
# hostnames & IPs #
###################
Pretty standard cluster:
+------+ +-------+
| cyan +------+ +---+ node1 |
+------+ | | +-------+
| |
+---+---+ | +-------+
| switch+------+---+ node2 +
+---+---+ | +-------+
| ~
+------+ | | +-------+
| blue +------+ +---| modeN |
+------+ +-------+
hostname server A: cyan
IP server A: 192.168.0.1
hostname server B: blue
IP server B: 192.168.0.2
hostname floating server: pink
floating IP: 192.168.0.3
#########
# Mysql #
#########
I've replicated '/etc/mysql/debian.cnf' on both cyan and
blue:
### debian.cnf ###
[client]
host = localhost
user = debian-sys-maint
password = SysMaintPass
socket = /var/run/mysqld/mysqld.sock
[mysql_upgrade]
host = localhost
user = debian-sys-maint
password = SysMaintPass
socket = /var/run/mysqld/mysqld.sock
basedir = /usr
##################
so to have the same SysMaintPass for
the debian-sys-maint user on both servers.
Moreover I've issued the following mysql command:
grant all on slurm_acct_db.* TO 'slurm'@'localhost' \
identified by 'SlurmDBDPass' with grant option;
########
# DRBD #
########
drbd (active/passive) manages this folder (NFS):
/var/lib/mysql
(slurm database is in /var/lib/mysql/slurm_acct_db)
#############
# pacemaker #
#############
pacemaker keeps all the following services/servers
always running on the "active" server only:
floating IP
mysql file system
mysql server
slurmdbd
slurmctld
#################
# slurmdbd.conf #
#################
I've replicated '/etc/slurm/slurmdbd.conf' on both
cyan and blue servers:
### slurmdbd.conf ###
AuthType=auth/munge
DbdHost=localhost <<<<<<--------- is this correct?
SlurmUser=slurm
StorageHost=localhost <<<<<<------ is this correct?
either localhost or pink should work if I understand your
configuration correctly
StoragePass=SlurmDBDPass
StorageType=accounting_storage/mysql
StorageUser=slurm
StorageLoc=slurm_acct_db
...
#####################
I DID NOT define DbdBackupHost and StorageBackupHost
intentionally.
I believe that using 'localhost' for both DbdHost and
StorageHost is correct because both slurmdbd and the
mysql servers will always be running side by side either
on cyan or blue.
What I miss now is how to configure slurm.conf properly
(see QUESTIONS section below)
##############
# slurm.conf #
##############
I've replicated '/etc/slurm/slurm.conf' on cyan and blue
servers + ALL NODES.
### slurm.conf ###
ControlMachine=cyan
ControlAddr=cyan
AccountingStorageHost=cyan
AccountingStorageType=accounting_storage/slurmdbd
AuthType=auth/munge
CryptoType=crypto/munge
SlurmUser=slurm
SlurmdUser=root
StateSaveLocation=/var/run/slurm/slurmctld
...
##################
#############
# QUESTIONS #
#############
1)
What we want is to have an identical/replicated
slurm.conf file on all hosts right?
Or should I have a different slurm.conf file on
the nodes?
Normally slurm.conf would be identical on all nodes.
2)
I believe I do not have to define the following
keywords in slurm.conf:
BackupController=
BackupAddr=
AccountingStorageBackupHost=
because the HA I want is obtained by 'moving' all
the services I need to the failover server (do not
make use of the embedded HA in slurm).
Is that correct?
You should remove or comment out those lines.
3)
Assuming point (2) is correct, should I apply the
following changes to slurm.conf?
ControlMachine=pink
ControlAddr=192.168.0.3
AccountingStorageHost=pink
Or, how would I have to get them set?
What you have is perfect.
4)
Shall I also add to my DBD/hearbeat/pacemaker
configuration the management of the following
folder:
/var/run/slurm
which is specified for 'StateSaveLocation'?
StateSaveLocation is where the slurmctld saves state.
SLURM configuration files should be located in the directory defined
by --sysconfdir when SLURM is built (PREFIX/etc by default where
"PREFIX" is SLURM's install directory).
Thanks for your input.
--matt