Here is what I have:

Jan 29 15:45:29 iliadserv2 slurmctld[18705]: sched: _slurm_rpc_step_complete StepId=753.0 usec=19 Jan 29 15:56:47 iliadserv2 slurmctld[18705]: Processing RPC: REQUEST_RECONFIGURE from uid=0 Jan 29 15:56:47 iliadserv2 slurmctld[18705]: error: Unable to create NodeAddr list from west[6-7][1-2][1-8] Jan 29 15:56:47 iliadserv2 slurmctld[18705]: fatal: Unable to create NodeAddr list from west[6-7][1-2][1-8] Jan 29 15:57:08 iliadserv2 slurmctld[7193]: error: Job accounting information gathered, but not stored Jan 29 15:57:08 iliadserv2 slurmctld[7193]: slurmctld version 2.5.1 started on cluster cluster Jan 29 15:57:08 iliadserv2 slurmctld[7193]: error: WARNING: Even though we are collecting accounting information you have asked for it not to be stored (accounting_storage/none) if this is not what you have in mind you will need to change it. Jan 29 15:57:08 iliadserv2 slurmctld[7193]: error: Unable to create NodeAddr list from west[6-7][1-2][1-8] Jan 29 15:57:08 iliadserv2 slurmctld[7193]: fatal: Unable to create NodeAddr list from west[6-7][1-2][1-8] Jan 29 15:58:36 iliadserv2 slurmctld[7258]: error: Job accounting information gathered, but not stored Jan 29 15:58:36 iliadserv2 slurmctld[7258]: slurmctld version 2.5.1 started on cluster cluster Jan 29 15:58:36 iliadserv2 slurmctld[7258]: error: WARNING: Even though we are collecting accounting information you have asked for it not to be stored (accounting_storage/none) if this is not what you have in mind you will need to change it.
Jan 29 15:58:36 iliadserv2 slurmctld[7258]: Recovered state of 28 nodes

I fixed the slurm.conf in between to be just west61[1-8],west62[1-8],west71[1-8],west72[1-8]. However, I wouldn't expect the master to go down do to not being able to make a NodeAddr list. I would expect it to refuse the new conf and spit out an error message.

-Paul Edmon-

On 01/30/2013 01:03 PM, David Bigagli wrote:
Re: [slurm-dev] Adding new nodes to slurm.conf
Do you have the slurmctld log when the master failed? It should be enough to add the hostname to slurm.conf, NodeName and PartitionName then 'scontrol reconfigure'.

/David


On Wed, Jan 30, 2013 at 3:58 PM, Paul Edmon <[email protected] <mailto:[email protected]>> wrote:


    Perhaps I missed the documentation on this but what is the proper
    order
    of operations to add new nodes to slurm.conf?  Currently if we
    start up
    slurmd on the new nodes but then don't have them in the conf it just
    fails on the nodes.  However, if we then later add them to the
    conf and
    to a reconfigure on the master the master process falls over and
    we have
    to restart it.  At that point they show up as unknown and waiting for
    the slurmd's on the respective new nodes to connect.  Ideally this
    wouldn't happen, the master shouldn't tip over just because new hosts
    are added to the conf.  Once those hosts are in though then simply
    restarting slurmd on the hosts works fine.

    So what is the proper order?  Do you put the new hosts in the conf and
    start up their slurmd's before you reconfig the master?

    -Paul Edmon-



Reply via email to