So during that period the master would cease managing everything and you 
wouldn't be able to submit?  Are those the only dangers for shutting 
down the master?

We tend to be in an environment where things are in production but also 
in flux.

-Paul Edmon-

On 01/30/2013 03:58 PM, Moe Jette wrote:
> You need to restart slurmctld to add nodes. Adding nodes causes a
> multitude of bitmaps to be rebuilt, which does not happen when
> "scontrol reconfig" is executed. Also you probably want to maintain a
> single slurm.conf file on all nodes.
>
> I would recommend
> 1. Stop slurmctld
> 2. Update slurm.conf on all nodes
> 3. Restart slurmctld
> 4. Start slurmd on the new nodes
>
>
> Quoting David Bigagli <[email protected]>:
>
>> Do you have the slurmctld log when the master failed? It should be enough
>> to add the hostname to slurm.conf, NodeName and PartitionName then
>> 'scontrol reconfigure'.
>>
>> /David
>>
>>
>> On Wed, Jan 30, 2013 at 3:58 PM, Paul Edmon <[email protected]> wrote:
>>
>>> Perhaps I missed the documentation on this but what is the proper order
>>> of operations to add new nodes to slurm.conf?  Currently if we start up
>>> slurmd on the new nodes but then don't have them in the conf it just
>>> fails on the nodes.  However, if we then later add them to the conf and
>>> to a reconfigure on the master the master process falls over and we have
>>> to restart it.  At that point they show up as unknown and waiting for
>>> the slurmd's on the respective new nodes to connect.  Ideally this
>>> wouldn't happen, the master shouldn't tip over just because new hosts
>>> are added to the conf.  Once those hosts are in though then simply
>>> restarting slurmd on the hosts works fine.
>>>
>>> So what is the proper order?  Do you put the new hosts in the conf and
>>> start up their slurmd's before you reconfig the master?
>>>
>>> -Paul Edmon-
>>>

Reply via email to