Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-18 Thread Greg Wickham
:22 pm To: Slurm User Community List Subject: [EXTERNAL] Re: [slurm-users] Maintaining slurm config files for test and production clusters Generating the *.conf files from parseable/testable sources is an interesting idea. You mention nodes.conf and partitions.conf. I can't find any documentat

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-18 Thread Groner, Rob
-users on behalf of Greg Wickham Sent: Wednesday, January 18, 2023 1:38 AM To: Slurm User Community List Subject: Re: [slurm-users] Maintaining slurm config files for test and production clusters You don't often get email from greg.wick...@kaust.edu.sa. Learn why this is important<ht

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-18 Thread Groner, Rob
@lists.schedmd.com Subject: Re: [slurm-users] Maintaining slurm config files for test and production clusters Run a secondary controller. Do 'scontrol takeover' before any changes, make your changes and restart slurmctld on the primary. If it fails, no harm/no foul, because the secondary is still running

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-17 Thread Greg Wickham
Hi Rob, Slurm doesn’t have a “validate” parameter hence one must know ahead of time whether the configuration will work or not. In answer to your question – yes – on our site the Slurm configuration is altered outside of a maintenance window. Depending upon the potential impact of the change,

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-17 Thread Brian Andrus
and crossing my fingers. Rob *From:* slurm-users on behalf of Fulcomer, Samuel *Sent:* Wednesday, January 4, 2023 1:54 PM *To:* Slurm User Community List *Subject:* Re: [slurm-users] Maintaining slurm config files for

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-17 Thread Groner, Rob
alf of Fulcomer, Samuel Sent: Wednesday, January 4, 2023 1:54 PM To: Slurm User Community List Subject: Re: [slurm-users] Maintaining slurm config files for test and production clusters You don't often get email from samuel_fulco...@brown.edu. Learn why this is important<https://aka.ms/Lea

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-04 Thread Fulcomer, Samuel
...and... using the same cluster name is important in our scenario for the seamless slurmdbd upgrade transition. In thinking about it a bit more, I'm not sure I'd want to fold together production and test/dev configs in the same revision control tree. We keep them separate. There's no reason to

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-04 Thread Paul Edmon
The symlink method for slurm.conf is what we do as well. We have a NFS mount from the slurm master that we host the slurm.conf on that we then symlink slurm.conf to that NFS share. -Paul Edmon- On 1/4/2023 1:53 PM, Brian Andrus wrote: One of the simple ways I have dealt with different

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-04 Thread Fulcomer, Samuel
Just make the cluster names the same, with different Nodename and Partition lines. The rest of slurm.conf can be the same. Having two cluster names is only necessary if you're running production in a multi-cluster configuration. Our model has been to have a production cluster and a test cluster

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-04 Thread Brian Andrus
One of the simple ways I have dealt with different configs is to symlink /etc/slurm/slurm.conf to the appropriate file (eg: slurm-dev.conf and slurm-prod.conf) In fact, I use the symlink for my dev and nothing (configless) for prod. Then I can change a running node to/from dev/prod by merely

[slurm-users] Maintaining slurm config files for test and production clusters

2023-01-04 Thread Groner, Rob
We currently have a test cluster and a production cluster, all on the same network. We try things on the test cluster, and then we gather those changes and make a change to the production cluster. We're doing that through two different repos, but we'd like to have a single repo to make the