You need to restart slurmctld to add nodes. Adding nodes causes a  
multitude of bitmaps to be rebuilt, which does not happen when  
"scontrol reconfig" is executed. Also you probably want to maintain a  
single slurm.conf file on all nodes.

I would recommend
1. Stop slurmctld
2. Update slurm.conf on all nodes
3. Restart slurmctld
4. Start slurmd on the new nodes


Quoting David Bigagli <[email protected]>:

> Do you have the slurmctld log when the master failed? It should be enough
> to add the hostname to slurm.conf, NodeName and PartitionName then
> 'scontrol reconfigure'.
>
> /David
>
>
> On Wed, Jan 30, 2013 at 3:58 PM, Paul Edmon <[email protected]> wrote:
>
>>
>> Perhaps I missed the documentation on this but what is the proper order
>> of operations to add new nodes to slurm.conf?  Currently if we start up
>> slurmd on the new nodes but then don't have them in the conf it just
>> fails on the nodes.  However, if we then later add them to the conf and
>> to a reconfigure on the master the master process falls over and we have
>> to restart it.  At that point they show up as unknown and waiting for
>> the slurmd's on the respective new nodes to connect.  Ideally this
>> wouldn't happen, the master shouldn't tip over just because new hosts
>> are added to the conf.  Once those hosts are in though then simply
>> restarting slurmd on the hosts works fine.
>>
>> So what is the proper order?  Do you put the new hosts in the conf and
>> start up their slurmd's before you reconfig the master?
>>
>> -Paul Edmon-
>>
>

Reply via email to