The [email protected] list will be retired March 1, 2012. The schedmd.com domain now hosts its replacement.
http://www.schedmd.com/slurmdocs/mail.html The new list is operational. Please resubmit this message to [email protected] The archive of the slurm-dev list will remain here: http://groups.google.com/group/slurm-devel. Postings to the new list will be archived to the same place. Hello everyone, the subject says it all, now I tell you how far I went and what piece of info I miss. ################### # hostnames & IPs # ################### Pretty standard cluster: +------+ +-------+ | cyan +------+ +---+ node1 | +------+ | | +-------+ | | +---+---+ | +-------+ | switch+------+---+ node2 + +---+---+ | +-------+ | ~ +------+ | | +-------+ | blue +------+ +---| modeN | +------+ +-------+ hostname server A: cyan IP server A: 192.168.0.1 hostname server B: blue IP server B: 192.168.0.2 hostname floating server: pink floating IP: 192.168.0.3 ######### # Mysql # ######### I've replicated '/etc/mysql/debian.cnf' on both cyan and blue: ### debian.cnf ### [client] host = localhost user = debian-sys-maint password = SysMaintPass socket = /var/run/mysqld/mysqld.sock [mysql_upgrade] host = localhost user = debian-sys-maint password = SysMaintPass socket = /var/run/mysqld/mysqld.sock basedir = /usr ################## so to have the same SysMaintPass for the debian-sys-maint user on both servers. Moreover I've issued the following mysql command: grant all on slurm_acct_db.* TO 'slurm'@'localhost' \ identified by 'SlurmDBDPass' with grant option; ######## # DRBD # ######## drbd (active/passive) manages this folder (NFS): /var/lib/mysql (slurm database is in /var/lib/mysql/slurm_acct_db) ############# # pacemaker # ############# pacemaker keeps all the following services/servers always running on the "active" server only: floating IP mysql file system mysql server slurmdbd slurmctld ################# # slurmdbd.conf # ################# I've replicated '/etc/slurm/slurmdbd.conf' on both cyan and blue servers: ### slurmdbd.conf ### AuthType=auth/munge DbdHost=localhost <<<<<<--------- is this correct? SlurmUser=slurm StorageHost=localhost <<<<<<------ is this correct? StoragePass=SlurmDBDPass StorageType=accounting_storage/mysql StorageUser=slurm StorageLoc=slurm_acct_db ... ##################### I DID NOT define DbdBackupHost and StorageBackupHost intentionally. I believe that using 'localhost' for both DbdHost and StorageHost is correct because both slurmdbd and the mysql servers will always be running side by side either on cyan or blue. What I miss now is how to configure slurm.conf properly (see QUESTIONS section below) ############## # slurm.conf # ############## I've replicated '/etc/slurm/slurm.conf' on cyan and blue servers + ALL NODES. ### slurm.conf ### ControlMachine=cyan ControlAddr=cyan AccountingStorageHost=cyan AccountingStorageType=accounting_storage/slurmdbd AuthType=auth/munge CryptoType=crypto/munge SlurmUser=slurm SlurmdUser=root StateSaveLocation=/var/run/slurm/slurmctld ... ################## ############# # QUESTIONS # ############# 1) What we want is to have an identical/replicated slurm.conf file on all hosts right? Or should I have a different slurm.conf file on the nodes? 2) I believe I do not have to define the following keywords in slurm.conf: BackupController= BackupAddr= AccountingStorageBackupHost= because the HA I want is obtained by 'moving' all the services I need to the failover server (do not make use of the embedded HA in slurm). Is that correct? 3) Assuming point (2) is correct, should I apply the following changes to slurm.conf? ControlMachine=pink ControlAddr=192.168.0.3 AccountingStorageHost=pink Or, how would I have to get them set? 4) Shall I also add to my DBD/hearbeat/pacemaker configuration the management of the following folder: /var/run/slurm which is specified for 'StateSaveLocation'? Thanks for your input. --matt
