Hi,

> On 02 Sep 2016, at 10:54, Paddy Doyle <pa...@tchpc.tcd.ie> wrote:
> 
> 
> On Thu, Sep 01, 2016 at 01:34:38PM -0700, Chad Cropper wrote:
> 
>> My cluster manager vendor???s automated slurm setup is causing me the 
>> headache I believe. They want to overwrite certain things in the slurm.conf 
>> and slurmdbd.conf. This is good for a basic cluster to even a large cluster. 
>> But we are working outside that controlled box.
>> 
>> As for the external, if I build slurm from the latest source (16.x), 
>> connecting a 14.x or 15.x to it will be OK? Our goal was to have 1 database, 
>> but allow us to have our PRD and DEV clusters use it, then we could test the 
>> cluster upgrades, which include SLURM updates (major versions) on DEV before 
>> touching PRD. Everything I read point to this but there is no definitive 
>> answer anywhere.
> 
> That's basically what we do: have a single slurmdbd and point all of our
> clusters at it.
> 
> Upgrades mean upgrading the dbd first, and then we can upgrade each individual
> cluster as required.

We also do this, run a number of Slurm clusters attached to a single slurmdbd. 
One issue with this is setup is that once you decommission a cluster, it needs 
to be removed from the DB somehow, otherwise your DB grows beyond reasonable 
size...

> 
> We currently have the dbd on 16.05.4 and have some 15.x clusters still 
> pointing
> to it fine. I can't recall exactly, but in the past we may have even had 2 
> major
> releases behind pointing to an up-to-date dbd... I'm not sure how far back you
> can go, but I suspect 14.x talking to a 16.x dbd would be fine.

With SlurmDBD 16.05.4 it is possible to run 15.08 and 14.11 clusters without 
problems (we do it). However, earlier releases won't work, the log would 
complain with something like this:

[2016-09-21T15:23:42.001] error: Incompatible RPC version received (6912 not 
between 7168 and 7680)
[2016-09-21T15:23:42.001] error: Processing last message from connection 
15(XXXXXXXX) uid(-2)


> 
> Paddy

Miguel

> 
>> -Chad
>> 
>> 
>> 
>>> On Sep 1, 2016, at 3:22 PM, Ryan Novosielski <novos...@rutgers.edu> wrote:
>>> 
>>> No, I don???t see what reason that would be easier. It wasn???t a big deal 
>>> really. We have two different VMs running slurmctld and a third one that 
>>> has the DB. I don???t believe we did any intermediate steps. It???s not 
>>> that complicated.
>>> 
>>>> On Sep 1, 2016, at 4:17 PM, Chad Cropper <chad.crop...@genusplc.com> wrote:
>>>> 
>>>> 
>>>> Is it best to do a local setup then change the config once it is running 
>>>> locally?
>>>> 
>>>> -Chad
>>>> 
>>>>> On Sep 1, 2016, at 3:07 PM, Ryan Novosielski <novos...@rutgers.edu> wrote:
>>>>> 
>>>>> Simply put: yes. That is our setup.
>>>>> 
>>>>>> On Sep 1, 2016, at 3:50 PM, Chad Cropper <chad.crop...@genusplc.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> Is it possible to have 2 clusters running slurm, but have them both 
>>>>>> point to an external server running 1 slurmdbd/MySQL.
>>>>>> 
>>>>> 
>>>>> --
>>>>> ____
>>>>> || \\UTGERS,           
>>>>> |---------------------------*O*---------------------------
>>>>> ||_// the State        |         Ryan Novosielski - novos...@rutgers.edu
>>>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS 
>>>>> Campus
>>>>> ||  \\    of NJ        | Office of Advanced Research Computing - MSB 
>>>>> C630, Newark
>>>>>  `'
>>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> *** The information contained in this communication may be confidential, 
>>>> is intended only for the use of the recipient(s) named above, and may be 
>>>> legally privileged. If the reader of this message is not the intended 
>>>> recipient, you are hereby notified that any dissemination, distribution, 
>>>> or copying of this communication, or any of its contents, is strictly 
>>>> prohibited. If you have received this communication in error, please 
>>>> return it to the sender immediately and delete the original message and 
>>>> any copies of it. If you have any questions concerning this message, 
>>>> please contact the sender. ***
>>> 
>>> --
>>> ____
>>> || \\UTGERS,           
>>> |---------------------------*O*---------------------------
>>> ||_// the State        |         Ryan Novosielski - novos...@rutgers.edu
>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>> ||  \\    of NJ        | Office of Advanced Research Computing - MSB C630, 
>>> Newark
>>>    `'
>>> 
>> 
>> 
>> ________________________________
>> *** The information contained in this communication may be confidential, is 
>> intended only for the use of the recipient(s) named above, and may be 
>> legally privileged. If the reader of this message is not the intended 
>> recipient, you are hereby notified that any dissemination, distribution, or 
>> copying of this communication, or any of its contents, is strictly 
>> prohibited. If you have received this communication in error, please return 
>> it to the sender immediately and delete the original message and any copies 
>> of it. If you have any questions concerning this message, please contact the 
>> sender. ***
> 
> -- 
> Paddy Doyle
> Trinity Centre for High Performance Computing,
> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
> Phone: +353-1-896-3725
> http://www.tchpc.tcd.ie/

-- 
Miguel Gila
CSCS Swiss National Supercomputing Centre
HPC Operations
Via Trevano 131 | CH-6900 Lugano | Switzerland
mg [at] cscs.ch 

Reply via email to