The slurm-dev@lists.llnl.gov list will be retired March 1, 2012.  The 
schedmd.com domain now hosts its replacement.

http://www.schedmd.com/slurmdocs/mail.html

The new list is operational.  Please resubmit this message to 
slurm-...@schedmd.com

The archive of the slurm-dev list will remain here:  
http://groups.google.com/group/slurm-devel.  Postings to the new list will be 
archived to the same place.

Answers are inline below.

Quoting Matteo Guglielmi <matteo.guglie...@epfl.ch>:

The slurm-dev@lists.llnl.gov list will be retired March 1, 2012. The schedmd.com domain now hosts its replacement.

http://www.schedmd.com/slurmdocs/mail.html

The new list is operational. Please resubmit this message to slurm-...@schedmd.com

The archive of the slurm-dev list will remain here: http://groups.google.com/group/slurm-devel. Postings to the new list will be archived to the same place.

Hello everyone,

the subject says it all, now I tell you
how far I went and what piece of info I
miss.

###################
# hostnames & IPs #
###################

Pretty standard cluster:

+------+                     +-------+
| cyan +------+          +---+ node1 |
+------+      |          |   +-------+
              |          |
          +---+---+      |   +-------+
          | switch+------+---+ node2 +
          +---+---+      |   +-------+
              |          ~
+------+      |          |   +-------+
| blue +------+          +---| modeN |
+------+                     +-------+

hostname server A:        cyan
IP server A:              192.168.0.1

hostname server B:        blue
IP server B:              192.168.0.2

hostname floating server: pink
floating IP:              192.168.0.3

#########
# Mysql #
#########

I've replicated '/etc/mysql/debian.cnf' on both cyan and
blue:

### debian.cnf ###
[client]
host     = localhost
user     = debian-sys-maint
password = SysMaintPass
socket   = /var/run/mysqld/mysqld.sock
[mysql_upgrade]
host     = localhost
user     = debian-sys-maint
password = SysMaintPass
socket   = /var/run/mysqld/mysqld.sock
basedir  = /usr
##################

so to have the same SysMaintPass for
the debian-sys-maint user on both servers.

Moreover I've issued the following mysql command:

grant all on slurm_acct_db.* TO 'slurm'@'localhost' \
  identified by 'SlurmDBDPass' with grant option;

########
# DRBD #
########

drbd (active/passive) manages this folder (NFS):

/var/lib/mysql

(slurm database is in /var/lib/mysql/slurm_acct_db)

#############
# pacemaker #
#############

pacemaker keeps all the following services/servers
always running on the "active" server only:

floating IP
mysql file system
mysql server
slurmdbd
slurmctld

#################
# slurmdbd.conf #
#################

I've replicated '/etc/slurm/slurmdbd.conf' on both
cyan and blue servers:

### slurmdbd.conf ###
AuthType=auth/munge
DbdHost=localhost    <<<<<<--------- is this correct?
SlurmUser=slurm
StorageHost=localhost   <<<<<<------ is this correct?

either localhost or pink should work if I understand your configuration correctly

StoragePass=SlurmDBDPass
StorageType=accounting_storage/mysql
StorageUser=slurm
StorageLoc=slurm_acct_db
...
#####################

I DID NOT define DbdBackupHost and StorageBackupHost
intentionally.

I believe that using 'localhost' for both DbdHost and
StorageHost is correct because both slurmdbd and the
mysql servers will always be running side by side either
on cyan or blue.

What I miss now is how to configure slurm.conf properly
(see QUESTIONS section below)

##############
# slurm.conf #
##############

I've replicated '/etc/slurm/slurm.conf' on cyan and blue
servers + ALL NODES.

### slurm.conf ###
ControlMachine=cyan
ControlAddr=cyan
AccountingStorageHost=cyan
AccountingStorageType=accounting_storage/slurmdbd
AuthType=auth/munge
CryptoType=crypto/munge
SlurmUser=slurm
SlurmdUser=root
StateSaveLocation=/var/run/slurm/slurmctld
...
##################

#############
# QUESTIONS #
#############

1)

  What we want is to have an identical/replicated
  slurm.conf file on all hosts right?

  Or should I have a different slurm.conf file on
  the nodes?

Normally slurm.conf would be identical on all nodes.


2)

  I believe I do not have to define the following
  keywords in slurm.conf:

  BackupController=
  BackupAddr=
  AccountingStorageBackupHost=

  because the HA I want is obtained by 'moving' all
  the services I need to the failover server (do not
  make use of the embedded HA in slurm).

  Is that correct?

You should remove or comment out those lines.


3)

  Assuming point (2) is correct, should I apply the
  following changes to slurm.conf?

  ControlMachine=pink
  ControlAddr=192.168.0.3
  AccountingStorageHost=pink

  Or, how would I have to get them set?

What you have is perfect.


4)

  Shall I also add to my DBD/hearbeat/pacemaker
  configuration the management of the following
  folder:

  /var/run/slurm

  which is specified for 'StateSaveLocation'?

StateSaveLocation is where the slurmctld saves state.
SLURM configuration files should be located in the directory defined by --sysconfdir when SLURM is built (PREFIX/etc by default where "PREFIX" is SLURM's install directory).

Thanks for your input.

--matt





Reply via email to