Hi Jin,

Your slurmctld.log says "Node compute004 appears to have a different
slurm.conf than the slurmctld" etc. This will happen if you didn't copy correctly the slurm.conf to the nodes. Please correct this potential error.

Also, please specify which version of Slurm you're running.

/Ole

On 10/22/2017 08:44 PM, JinSung Kang wrote:
I am having trouble with adding new nodes into slurm cluster without killing the jobs that are currently running.

Right now I

1. Update the slurm.conf and add a new node to it
2. Copy new slurm.conf to all the nodes,
3. Restart the slurmd on all nodes
4. Restart the slurmctld

But when I restart slurmctld all the jobs that were currently running are requeued (Begin Time) as reason for not running. The new added node works perfectly fine.

I've included the slurm.conf. I've also included slurmctld.log output when I'm trying to add the new node.

Reply via email to