Thank you Tina. It will really help Regards Navin
On Thu, Oct 28, 2021, 22:01 Tina Friedrich <tina.friedr...@it.ox.ac.uk> wrote: > Hello, > > I have the database on a separate server (it runs the database and the > database only). The login nodes run nothing SLURM related, they simply > have the binaries installed & a SLURM config. > > I've never looked into having multiple databases & using > AccountingStorageExternalHost (in fact I'd forgotten you could do that), > so I can't comment on that (maybe someone else can); I think that works, > yes, but as I said never tested that (didn't see much point in running > multiple databases if one would do the job). > > I actually have specific login nodes for both of my clusters, to make it > easier for users (especially those with not much experience using the > HPC environment); so I have one login node connecting to cluster 1 and > one connecting to cluster 1. > > I think the relevant bits of slurm.conf Relevant config entries (if I'm > not mistaken) on the login nodes are probably: > > The differences in the slurm config files (that haven't got to do with > topology & nodes & scheduler tuning) are > > ClusterName=cluster1 > ControlMachine=cluster1-slurm > ControlAddr=/IP_OF_SLURM_CONTROLLER/ > > ClusterName=cluster2 > ControlMachine=cluster2-slurm > ControlAddr=/IP_OF_SLURM_CONTROLLER/ > > (where IP_OF_SLURM_CONTROLLER is the IP address of host cluster1-slurm, > same for cluster2) > > And then the have common entries for the AccountingStorageHost: > > AccountingStorageHost=slurm-db-prod > AccountingStorageBackupHost=slurm-db-prod > AccountingStoragePort=7030 > AccountingStorageType=accounting_storage/slurmdbd > > (slurm-db-prod is simply the hostname of the SLURM database server) > > Does that help? > > Tina > > On 28/10/2021 14:59, navin srivastava wrote: > > Thank you Tina. > > > > so if i understood correctly.Database is global to both cluster and > > running on login Node? > > or is the database running on one of the master Node and shared with > > another master server Node? > > > > but as far I have read that the slurm database can also be separate on > > both the master and just use the parameter > > AccountingStorageExternalHost so that both databases are aware of each > > other. > > > > Also on the login node in slurm .conf file pointed to which Slurmctld? > > is it possible to share the sample slurm.conf file of login Node. > > > > Regards > > Navin. > > > > > > > > > > > > > > > > > > On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich > > <tina.friedr...@it.ox.ac.uk <mailto:tina.friedr...@it.ox.ac.uk>> wrote: > > > > Hi Navin, > > > > well, I have two clusters & login nodes that allow access to both. > That > > do? I don't think a third would make any difference in setup. > > > > They need to share a database. As long as the share a database, the > > clusters have 'knowledge' of each other. > > > > So if you set up one database server (running slurmdbd), and then a > > SLURM controller for each cluster (running slurmctld) using that one > > central database, the '-M' option should work. > > > > Tina > > > > On 28/10/2021 10:54, navin srivastava wrote: > > > Hi , > > > > > > I am looking for a stepwise guide to setup multi cluster > > implementation. > > > We wanted to set up 3 clusters and one Login Node to run the job > > using > > > -M cluster option. > > > can anybody have such a setup and can share some insight into how > it > > > works and it is really a stable solution. > > > > > > > > > Regards > > > Navin. > > > > -- > > Tina Friedrich, Advanced Research Computing Snr HPC Systems > > Administrator > > > > Research Computing and Support Services > > IT Services, University of Oxford > > http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk> > > http://www.it.ox.ac.uk <http://www.it.ox.ac.uk> > > > > -- > Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator > > Research Computing and Support Services > IT Services, University of Oxford > http://www.arc.ox.ac.uk http://www.it.ox.ac.uk > >