Re: [slurm-users] Slurm Multi-cluster implementation

2021-11-03 Thread Tina Friedrich
Thank you for that - I'm restricting things via limits.conf on the login nodes at the moment, but have been considering using cgroups instead for a while. So this is very useful :) If we're sharing details, our setup currently is: 2x2 cluster not-quite-federations - prod and dev, each with a

Re: [slurm-users] Slurm Multi-cluster implementation

2021-11-01 Thread Yair Yarom
cpu limit using ulimit is pretty straightforward with pam_limits and /etc/security/limits.conf. On some of the login nodes we have a cpu limit of 10 minutes, so heavy processes will fail. The memory was a bit more complicated (i.e. not pretty). We wanted that a user won't be able to use more than

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-31 Thread Brian Andrus
That is interesting to me. How do you use ulimit and systemd to limit user usage on the login nodes? This sounds like something very useful. Brian Andrus On 10/31/2021 1:08 AM, Yair Yarom wrote: Hi, If it helps, this is our setup: 6 clusters (actually a bit more) 1 mysql + slurmdbd on the

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-31 Thread Yair Yarom
Hi, If it helps, this is our setup: 6 clusters (actually a bit more) 1 mysql + slurmdbd on the same host 6 primary slurmctld on 3 hosts (need to make sure each have a distinct SlurmctldPort) 6 secondary slurmctld on an arbitrary node on the clusters themselves. 1 login node per cluster (this is a

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava
Thank you Tina. It will really help Regards Navin On Thu, Oct 28, 2021, 22:01 Tina Friedrich wrote: > Hello, > > I have the database on a separate server (it runs the database and the > database only). The login nodes run nothing SLURM related, they simply > have the binaries installed & a

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread Tina Friedrich
Hello, I have the database on a separate server (it runs the database and the database only). The login nodes run nothing SLURM related, they simply have the binaries installed & a SLURM config. I've never looked into having multiple databases & using AccountingStorageExternalHost (in fact

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava
Thank you Tina. so if i understood correctly.Database is global to both cluster and running on login Node? or is the database running on one of the master Node and shared with another master server Node? but as far I have read that the slurm database can also be separate on both the master and

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread Tina Friedrich
Hi Navin, well, I have two clusters & login nodes that allow access to both. That do? I don't think a third would make any difference in setup. They need to share a database. As long as the share a database, the clusters have 'knowledge' of each other. So if you set up one database server

[slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava
Hi , I am looking for a stepwise guide to setup multi cluster implementation. We wanted to set up 3 clusters and one Login Node to run the job using -M cluster option. can anybody have such a setup and can share some insight into how it works and it is really a stable solution. Regards Navin.