Do you have the slurmctld log when the master failed? It should be enough to add the hostname to slurm.conf, NodeName and PartitionName then 'scontrol reconfigure'.
/David On Wed, Jan 30, 2013 at 3:58 PM, Paul Edmon <[email protected]> wrote: > > Perhaps I missed the documentation on this but what is the proper order > of operations to add new nodes to slurm.conf? Currently if we start up > slurmd on the new nodes but then don't have them in the conf it just > fails on the nodes. However, if we then later add them to the conf and > to a reconfigure on the master the master process falls over and we have > to restart it. At that point they show up as unknown and waiting for > the slurmd's on the respective new nodes to connect. Ideally this > wouldn't happen, the master shouldn't tip over just because new hosts > are added to the conf. Once those hosts are in though then simply > restarting slurmd on the hosts works fine. > > So what is the proper order? Do you put the new hosts in the conf and > start up their slurmd's before you reconfig the master? > > -Paul Edmon- >
