Hi Jin,
Your slurmctld.log says "Node compute004 appears to have a different
slurm.conf than the slurmctld" etc. This will happen if you didn't copy
correctly the slurm.conf to the nodes. Please correct this potential error.
Also, please specify which version of Slurm you're running.
/Ole
On 10/22/2017 08:44 PM, JinSung Kang wrote:
I am having trouble with adding new nodes into slurm cluster without
killing the jobs that are currently running.
Right now I
1. Update the slurm.conf and add a new node to it
2. Copy new slurm.conf to all the nodes,
3. Restart the slurmd on all nodes
4. Restart the slurmctld
But when I restart slurmctld all the jobs that were currently running
are requeued (Begin Time) as reason for not running. The new added node
works perfectly fine.
I've included the slurm.conf. I've also included slurmctld.log output
when I'm trying to add the new node.