RE: [slurm-dev] RFD: simple getting rid of config syncronization.

Jette, Moe Thu, 12 May 2011 16:16:06 -0700

I would also add that you can include other files within a slurm.conf
file, which could make management of SLURM's configuration file easier
for you.
________________________________________
From: [email protected] [[email protected]] On Behalf 
Of Danny Auble [[email protected]]
Sent: Thursday, May 12, 2011 4:00 PM
To: [email protected]
Subject: Re: [slurm-dev] RFD: simple getting rid of config syncronization.

Hey Andrej,

On Thursday, May 12, 2011 03:10:14 PM Andrej N. Gritsenko wrote:
>     Hello there!
>
>     There is a little problem with SLURM: each node should have own copy
> of slurm.conf and if you change it on controller then you should update
> it on all other nodes as well. There is a simple solution for it - you
> should just have it on one node - slurmdbd for example - and load it for
> each slurmctld or slurmd by means of accounting_storage plugin. See a
> simple API for that in attachment, it expands config file name (option -f
> of slurmd or slurmctld or environment variable SLURM_CONF) so it can now
> contain some non-local name in form "plugin:host:port", i.e. for example
> "slurmdbd:sqlnode:7031". If use it with accounting_storage/slurmdbd the
> slurmdbd.conf should contain slurmdbd variables and slurm.conf variables
> too, or else RPC DBD_GET_CONFIG (and acct_storage_g_get_config() function
> and appropriate acct_storage_p_get_config() as well) should be expanded
> in future version of SLURM to request exact config type.

I am not sure how scalable this would be.

What if you had 10k nodes all asking for it at the same time?

What if your slurmdbd isn't on the network your compute nodes have access to?

What happens if your DBD is down or unreachable?  It seems like it could 
potentially bring down your entire enterprise (as most people have 1 DBD to 
service multiple clusters) with a potential single point of failure.

Another concern is all the other user commands like srun, squeue and such would 
have to go and get this information as well when starting.  It seems like it 
could overload the DBD quite fast if it was continuously being bombarded with 
requests for config information.  It seems like you would have to have a 
wrapper around all the commands as well to tell the commands where the DBD was. 
 How do you handle this in your cluster running this?

I am not against the idea of having the database house this information, but 
currently you don't have to have the database to run.  I am guessing very few 
people use the -f option (as you probably noted when you found the bug earlier 
;)).

On most installations the slurm.conf doesn't change very often and there are 
tools out there that will dist to all your nodes when it does.

>     This solution already works very well in our cluster but we use own
> accounting_storage plugin so to get accounting_storage/slurmdbd work that
> way you have to do something with described problem as either slurmdbd
> will complain about unknown variables in config or scontrol will do. So
> I've marked the proposal as RFD - if you want it then resolve the problem
> (in 2.3.0 probably?) and include this patch. :)

I am not sure I understand your statement about unknown variables?  I am 
guessing that is on your install?

Thanks for your ideas,
Danny

>
>     Andriy.

RE: [slurm-dev] RFD: simple getting rid of config syncronization.

Reply via email to