Do you have the slurmctld log when the master failed? It should be enough
to add the hostname to slurm.conf, NodeName and PartitionName then
'scontrol reconfigure'.

/David


On Wed, Jan 30, 2013 at 3:58 PM, Paul Edmon <[email protected]> wrote:

>
> Perhaps I missed the documentation on this but what is the proper order
> of operations to add new nodes to slurm.conf?  Currently if we start up
> slurmd on the new nodes but then don't have them in the conf it just
> fails on the nodes.  However, if we then later add them to the conf and
> to a reconfigure on the master the master process falls over and we have
> to restart it.  At that point they show up as unknown and waiting for
> the slurmd's on the respective new nodes to connect.  Ideally this
> wouldn't happen, the master shouldn't tip over just because new hosts
> are added to the conf.  Once those hosts are in though then simply
> restarting slurmd on the hosts works fine.
>
> So what is the proper order?  Do you put the new hosts in the conf and
> start up their slurmd's before you reconfig the master?
>
> -Paul Edmon-
>

Reply via email to