The slurm-dev@lists.llnl.gov list will be retired March 1, 2012. The schedmd.com domain now hosts its replacement.
http://www.schedmd.com/slurmdocs/mail.html The new list is operational. Please resubmit this message to slurm-...@schedmd.com The archive of the slurm-dev list will remain here: http://groups.google.com/group/slurm-devel. Postings to the new list will be archived to the same place. All right, let me further clarify what still bothers me. Given the following definitions from "man slurm.conf": ... ControlMachine The short hostname of the machine where SLURM control functions are executed (i.e. the name returned by the command "hostname -s", use "tux001" rather than "tux001.my.com"). This value must be specified. In order to support some high availability architectures, multiple hostnames may be listed with comma separators and one ControlAddr must be specified. The high availability system must insure that the slurmctld daemon is running on only one of these hosts at a time. See the RELOCATING CONTROLLERS section if you change this. ControlAddr Name that ControlMachine should be referred to in establishing a communications path. This name will be used as an argument to the gethostbyname() function for identification. For example, "elx0000" might be used to designate the Ethernet address for node "lx0000". By default the ControlAddr will be identical in value to ControlMachine. AccountingStorageHost The name of the machine hosting the accounting storage database. Only used for database type storage plugins, ignored otherwise. Also see DefaultStorageHost. ... I'm a bit worried about setting ControlMachine=pink because 'pink' is NOT a "real name" of any host but "only" the name associated to the floating IP in /etc/hosts on both cyan and blue servers. The command "hostname -s" DOES NOT return 'pink' at all even when the floating IP gets "attached" (better "added") to the ethernet interface of either cyan or blue. So, is this configuration still ok ControlMachine=pink ControlAddr=192.168.0.3 AccountingStorageHost=pink or should I do this instead: ControlMachine=cyan,blue ControlAddr=pink AccountingStorageHost=pink ? What's the difference between the two setup? Thanks, --matt On 02/22/12 17:31, Moe Jette wrote: > The slurm-dev@lists.llnl.gov list will be retired March 1, 2012. The > schedmd.com domain now hosts its replacement. > > http://www.schedmd.com/slurmdocs/mail.html > > The new list is operational. Please resubmit this message to > slurm-...@schedmd.com > > The archive of the slurm-dev list will remain here: > http://groups.google.com/group/slurm-devel. Postings to the new list > will be archived to the same place. > > Answers are inline below. > > Quoting Matteo Guglielmi <matteo.guglie...@epfl.ch>: > >> The slurm-dev@lists.llnl.gov list will be retired March 1, 2012. The >> schedmd.com domain now hosts its replacement. >> >> http://www.schedmd.com/slurmdocs/mail.html >> >> The new list is operational. Please resubmit this message to >> slurm-...@schedmd.com >> >> The archive of the slurm-dev list will remain here: >> http://groups.google.com/group/slurm-devel. Postings to the new list >> will be archived to the same place. >> >> Hello everyone, >> >> the subject says it all, now I tell you >> how far I went and what piece of info I >> miss. >> >> ################### >> # hostnames & IPs # >> ################### >> >> Pretty standard cluster: >> >> +------+ +-------+ >> | cyan +------+ +---+ node1 | >> +------+ | | +-------+ >> | | >> +---+---+ | +-------+ >> | switch+------+---+ node2 + >> +---+---+ | +-------+ >> | ~ >> +------+ | | +-------+ >> | blue +------+ +---| modeN | >> +------+ +-------+ >> >> hostname server A: cyan >> IP server A: 192.168.0.1 >> >> hostname server B: blue >> IP server B: 192.168.0.2 >> >> hostname floating server: pink >> floating IP: 192.168.0.3 >> >> ######### >> # Mysql # >> ######### >> >> I've replicated '/etc/mysql/debian.cnf' on both cyan and >> blue: >> >> ### debian.cnf ### >> [client] >> host = localhost >> user = debian-sys-maint >> password = SysMaintPass >> socket = /var/run/mysqld/mysqld.sock >> [mysql_upgrade] >> host = localhost >> user = debian-sys-maint >> password = SysMaintPass >> socket = /var/run/mysqld/mysqld.sock >> basedir = /usr >> ################## >> >> so to have the same SysMaintPass for >> the debian-sys-maint user on both servers. >> >> Moreover I've issued the following mysql command: >> >> grant all on slurm_acct_db.* TO 'slurm'@'localhost' \ >> identified by 'SlurmDBDPass' with grant option; >> >> ######## >> # DRBD # >> ######## >> >> drbd (active/passive) manages this folder (NFS): >> >> /var/lib/mysql >> >> (slurm database is in /var/lib/mysql/slurm_acct_db) >> >> ############# >> # pacemaker # >> ############# >> >> pacemaker keeps all the following services/servers >> always running on the "active" server only: >> >> floating IP >> mysql file system >> mysql server >> slurmdbd >> slurmctld >> >> ################# >> # slurmdbd.conf # >> ################# >> >> I've replicated '/etc/slurm/slurmdbd.conf' on both >> cyan and blue servers: >> >> ### slurmdbd.conf ### >> AuthType=auth/munge >> DbdHost=localhost <<<<<<--------- is this correct? >> SlurmUser=slurm >> StorageHost=localhost <<<<<<------ is this correct? > > either localhost or pink should work if I understand your configuration > correctly > >> StoragePass=SlurmDBDPass >> StorageType=accounting_storage/mysql >> StorageUser=slurm >> StorageLoc=slurm_acct_db >> ... >> ##################### >> >> I DID NOT define DbdBackupHost and StorageBackupHost >> intentionally. >> >> I believe that using 'localhost' for both DbdHost and >> StorageHost is correct because both slurmdbd and the >> mysql servers will always be running side by side either >> on cyan or blue. >> >> What I miss now is how to configure slurm.conf properly >> (see QUESTIONS section below) >> >> ############## >> # slurm.conf # >> ############## >> >> I've replicated '/etc/slurm/slurm.conf' on cyan and blue >> servers + ALL NODES. >> >> ### slurm.conf ### >> ControlMachine=cyan >> ControlAddr=cyan >> AccountingStorageHost=cyan >> AccountingStorageType=accounting_storage/slurmdbd >> AuthType=auth/munge >> CryptoType=crypto/munge >> SlurmUser=slurm >> SlurmdUser=root >> StateSaveLocation=/var/run/slurm/slurmctld >> ... >> ################## >> >> ############# >> # QUESTIONS # >> ############# >> >> 1) >> >> What we want is to have an identical/replicated >> slurm.conf file on all hosts right? >> >> Or should I have a different slurm.conf file on >> the nodes? > > Normally slurm.conf would be identical on all nodes. > > >> 2) >> >> I believe I do not have to define the following >> keywords in slurm.conf: >> >> BackupController= >> BackupAddr= >> AccountingStorageBackupHost= >> >> because the HA I want is obtained by 'moving' all >> the services I need to the failover server (do not >> make use of the embedded HA in slurm). >> >> Is that correct? > > You should remove or comment out those lines. > > >> 3) >> >> Assuming point (2) is correct, should I apply the >> following changes to slurm.conf? >> >> ControlMachine=pink >> ControlAddr=192.168.0.3 >> AccountingStorageHost=pink >> >> Or, how would I have to get them set? > > What you have is perfect. > > >> 4) >> >> Shall I also add to my DBD/hearbeat/pacemaker >> configuration the management of the following >> folder: >> >> /var/run/slurm >> >> which is specified for 'StateSaveLocation'? > > StateSaveLocation is where the slurmctld saves state. > SLURM configuration files should be located in the directory defined by > --sysconfdir when SLURM is built (PREFIX/etc by default where "PREFIX" > is SLURM's install directory). > >> Thanks for your input. >> >> --matt >> >> > > > > . >