You need to restart slurmctld to add nodes. Adding nodes causes a multitude of bitmaps to be rebuilt, which does not happen when "scontrol reconfig" is executed. Also you probably want to maintain a single slurm.conf file on all nodes.
I would recommend 1. Stop slurmctld 2. Update slurm.conf on all nodes 3. Restart slurmctld 4. Start slurmd on the new nodes Quoting David Bigagli <[email protected]>: > Do you have the slurmctld log when the master failed? It should be enough > to add the hostname to slurm.conf, NodeName and PartitionName then > 'scontrol reconfigure'. > > /David > > > On Wed, Jan 30, 2013 at 3:58 PM, Paul Edmon <[email protected]> wrote: > >> >> Perhaps I missed the documentation on this but what is the proper order >> of operations to add new nodes to slurm.conf? Currently if we start up >> slurmd on the new nodes but then don't have them in the conf it just >> fails on the nodes. However, if we then later add them to the conf and >> to a reconfigure on the master the master process falls over and we have >> to restart it. At that point they show up as unknown and waiting for >> the slurmd's on the respective new nodes to connect. Ideally this >> wouldn't happen, the master shouldn't tip over just because new hosts >> are added to the conf. Once those hosts are in though then simply >> restarting slurmd on the hosts works fine. >> >> So what is the proper order? Do you put the new hosts in the conf and >> start up their slurmd's before you reconfig the master? >> >> -Paul Edmon- >> >
