The slurm-dev@lists.llnl.gov list will be retired March 1, 2012.  The 
schedmd.com domain now hosts its replacement.

http://www.schedmd.com/slurmdocs/mail.html

The new list is operational.  Please resubmit this message to 
slurm-...@schedmd.com

The archive of the slurm-dev list will remain here:  
http://groups.google.com/group/slurm-devel.  Postings to the new list will be 
archived to the same place.

All right, let me further clarify what still bothers me.

Given the following definitions from "man slurm.conf":

...
ControlMachine
  The  short hostname of the machine where SLURM control
  functions are executed (i.e. the name returned by the
  command "hostname -s", use "tux001" rather than
  "tux001.my.com"). This value must be specified. In
  order to support some high availability architectures,
  multiple hostnames may be listed with comma separators
  and one ControlAddr must be specified. The high
  availability system must insure that the slurmctld
  daemon is running on only one of these hosts at a time.
  See the RELOCATING CONTROLLERS section if you change this.

ControlAddr
  Name that ControlMachine should be referred to in
  establishing a communications path. This name will be
  used as an argument to the gethostbyname() function for
  identification. For example, "elx0000" might be used to
  designate the Ethernet address for node "lx0000". By
  default the ControlAddr will be identical in value to
  ControlMachine.

AccountingStorageHost
  The name of the machine hosting the accounting storage
  database. Only used for database type storage plugins,
  ignored otherwise. Also see DefaultStorageHost.
...

I'm a bit worried about setting ControlMachine=pink
because 'pink' is NOT a "real name" of any host but "only"
the name associated to the floating IP in /etc/hosts on
both cyan and blue servers.

The command "hostname -s" DOES NOT return 'pink' at all
even when the floating IP gets "attached" (better "added")
to the ethernet interface of either cyan or blue. 

So, is this configuration still ok

ControlMachine=pink
ControlAddr=192.168.0.3
AccountingStorageHost=pink

or should I do this instead:

ControlMachine=cyan,blue
ControlAddr=pink
AccountingStorageHost=pink

?

What's the difference between the two setup?

Thanks,

--matt


On 02/22/12 17:31, Moe Jette wrote:
> The slurm-dev@lists.llnl.gov list will be retired March 1, 2012. The 
> schedmd.com domain now hosts its replacement.
> 
> http://www.schedmd.com/slurmdocs/mail.html
> 
> The new list is operational. Please resubmit this message to 
> slurm-...@schedmd.com
> 
> The archive of the slurm-dev list will remain here: 
> http://groups.google.com/group/slurm-devel. Postings to the new list 
> will be archived to the same place.
> 
> Answers are inline below.
> 
> Quoting Matteo Guglielmi <matteo.guglie...@epfl.ch>:
> 
>> The slurm-dev@lists.llnl.gov list will be retired March 1, 2012. The 
>> schedmd.com domain now hosts its replacement.
>>
>> http://www.schedmd.com/slurmdocs/mail.html
>>
>> The new list is operational. Please resubmit this message to 
>> slurm-...@schedmd.com
>>
>> The archive of the slurm-dev list will remain here: 
>> http://groups.google.com/group/slurm-devel. Postings to the new list 
>> will be archived to the same place.
>>
>> Hello everyone,
>>
>> the subject says it all, now I tell you
>> how far I went and what piece of info I
>> miss.
>>
>> ###################
>> # hostnames & IPs #
>> ###################
>>
>> Pretty standard cluster:
>>
>> +------+ +-------+
>> | cyan +------+ +---+ node1 |
>> +------+ | | +-------+
>> | |
>> +---+---+ | +-------+
>> | switch+------+---+ node2 +
>> +---+---+ | +-------+
>> | ~
>> +------+ | | +-------+
>> | blue +------+ +---| modeN |
>> +------+ +-------+
>>
>> hostname server A: cyan
>> IP server A: 192.168.0.1
>>
>> hostname server B: blue
>> IP server B: 192.168.0.2
>>
>> hostname floating server: pink
>> floating IP: 192.168.0.3
>>
>> #########
>> # Mysql #
>> #########
>>
>> I've replicated '/etc/mysql/debian.cnf' on both cyan and
>> blue:
>>
>> ### debian.cnf ###
>> [client]
>> host = localhost
>> user = debian-sys-maint
>> password = SysMaintPass
>> socket = /var/run/mysqld/mysqld.sock
>> [mysql_upgrade]
>> host = localhost
>> user = debian-sys-maint
>> password = SysMaintPass
>> socket = /var/run/mysqld/mysqld.sock
>> basedir = /usr
>> ##################
>>
>> so to have the same SysMaintPass for
>> the debian-sys-maint user on both servers.
>>
>> Moreover I've issued the following mysql command:
>>
>> grant all on slurm_acct_db.* TO 'slurm'@'localhost' \
>> identified by 'SlurmDBDPass' with grant option;
>>
>> ########
>> # DRBD #
>> ########
>>
>> drbd (active/passive) manages this folder (NFS):
>>
>> /var/lib/mysql
>>
>> (slurm database is in /var/lib/mysql/slurm_acct_db)
>>
>> #############
>> # pacemaker #
>> #############
>>
>> pacemaker keeps all the following services/servers
>> always running on the "active" server only:
>>
>> floating IP
>> mysql file system
>> mysql server
>> slurmdbd
>> slurmctld
>>
>> #################
>> # slurmdbd.conf #
>> #################
>>
>> I've replicated '/etc/slurm/slurmdbd.conf' on both
>> cyan and blue servers:
>>
>> ### slurmdbd.conf ###
>> AuthType=auth/munge
>> DbdHost=localhost <<<<<<--------- is this correct?
>> SlurmUser=slurm
>> StorageHost=localhost <<<<<<------ is this correct?
> 
> either localhost or pink should work if I understand your configuration 
> correctly
> 
>> StoragePass=SlurmDBDPass
>> StorageType=accounting_storage/mysql
>> StorageUser=slurm
>> StorageLoc=slurm_acct_db
>> ...
>> #####################
>>
>> I DID NOT define DbdBackupHost and StorageBackupHost
>> intentionally.
>>
>> I believe that using 'localhost' for both DbdHost and
>> StorageHost is correct because both slurmdbd and the
>> mysql servers will always be running side by side either
>> on cyan or blue.
>>
>> What I miss now is how to configure slurm.conf properly
>> (see QUESTIONS section below)
>>
>> ##############
>> # slurm.conf #
>> ##############
>>
>> I've replicated '/etc/slurm/slurm.conf' on cyan and blue
>> servers + ALL NODES.
>>
>> ### slurm.conf ###
>> ControlMachine=cyan
>> ControlAddr=cyan
>> AccountingStorageHost=cyan
>> AccountingStorageType=accounting_storage/slurmdbd
>> AuthType=auth/munge
>> CryptoType=crypto/munge
>> SlurmUser=slurm
>> SlurmdUser=root
>> StateSaveLocation=/var/run/slurm/slurmctld
>> ...
>> ##################
>>
>> #############
>> # QUESTIONS #
>> #############
>>
>> 1)
>>
>> What we want is to have an identical/replicated
>> slurm.conf file on all hosts right?
>>
>> Or should I have a different slurm.conf file on
>> the nodes?
> 
> Normally slurm.conf would be identical on all nodes.
> 
> 
>> 2)
>>
>> I believe I do not have to define the following
>> keywords in slurm.conf:
>>
>> BackupController=
>> BackupAddr=
>> AccountingStorageBackupHost=
>>
>> because the HA I want is obtained by 'moving' all
>> the services I need to the failover server (do not
>> make use of the embedded HA in slurm).
>>
>> Is that correct?
> 
> You should remove or comment out those lines.
> 
> 
>> 3)
>>
>> Assuming point (2) is correct, should I apply the
>> following changes to slurm.conf?
>>
>> ControlMachine=pink
>> ControlAddr=192.168.0.3
>> AccountingStorageHost=pink
>>
>> Or, how would I have to get them set?
> 
> What you have is perfect.
> 
> 
>> 4)
>>
>> Shall I also add to my DBD/hearbeat/pacemaker
>> configuration the management of the following
>> folder:
>>
>> /var/run/slurm
>>
>> which is specified for 'StateSaveLocation'?
> 
> StateSaveLocation is where the slurmctld saves state.
> SLURM configuration files should be located in the directory defined by 
> --sysconfdir when SLURM is built (PREFIX/etc by default where "PREFIX" 
> is SLURM's install directory).
> 
>> Thanks for your input.
>>
>> --matt
>>
>>
> 
> 
> 
> .
> 

Reply via email to