Hi all;

I am trying to run slurm with multiple slurmd daemons on a single node. I
built slurm with "--enable-multiple-slurmd" parameter, I modified
slurm.conf:

# COMPUTE NODES
NodeName=node1 NodeHostname=claudioslurm Port=17001 CPUs=2 State=UNKNOWN
NodeName=node2 NodeHostname=claudioslurm Port=17002 CPUs=2 State=UNKNOWN
PartitionName=Cloud1 Nodes=node[1-2] Default=YES MaxTime=INFINITE State=UP

and I started each slurmd:

[root@claudioslurm log]# slurmd -N node1
[root@claudioslurm log]# slurmd -N node2

However, I am still not able to run a parallel job on it. The log files
show that slurmctld doesn't know node1 and node2 and just after I run
"slurmd -N node1", a slurmd is killed, and after I run "slurmd -N node2",
the slurmd previously created (node1) shutdown.
Could anyone, please, tell me what I am doing wrong?

Thanks, Claudio


slurmcrtl.log :
....
[2016-03-16T12:09:22.381] error: Registration message from unknown node
node1
[2016-03-16T12:09:22.381] error: _slurm_rpc_node_registration node=node1:
Invalid node name specified
[2016-03-16T12:10:45.830] error: Registration message from unknown node
node2
[2016-03-16T12:10:45.830] error: _slurm_rpc_node_registration node=node2:
Invalid node name specified
[2016-03-16T12:17:14.721] error: Nodes claudioslurm not responding, setting
DOWN

slurmd.node1.log:

[2016-03-16T12:09:22.375] Message aggregation disabled
[2016-03-16T12:09:22.375] CPU frequency setting not configured for this node
[2016-03-16T12:09:22.375] Resource spec: Reserved system memory limit not
configured for this node
[2016-03-16T12:09:22.377] slurmd version 15.08.7 started
[2016-03-16T12:09:22.377] killing old slurmd[8656]
[2016-03-16T12:09:22.379] slurmd started on Wed, 16 Mar 2016 12:09:22 +1100
[2016-03-16T12:09:22.380] CPUs=2 Boards=1 Sockets=2 Cores=1 Threads=1
Memory=5852 TmpDisk=30204 Uptime=4240700 CPUSpecList=(null)
[2016-03-16T12:10:45.827] Slurmd shutdown completing

slurmd.node2.log:

[2016-03-16T12:10:45.824] Message aggregation disabled
[2016-03-16T12:10:45.824] CPU frequency setting not configured for this node
[2016-03-16T12:10:45.824] Resource spec: Reserved system memory limit not
configured for this node
[2016-03-16T12:10:45.826] slurmd version 15.08.7 started
[2016-03-16T12:10:45.826] killing old slurmd[9352]
[2016-03-16T12:10:45.828] slurmd started on Wed, 16 Mar 2016 12:10:45 +1100
[2016-03-16T12:10:45.829] CPUs=2 Boards=1 Sockets=2 Cores=1 Threads=1
Memory=5852 TmpDisk=30204 Uptime=4240783 CPUSpecList=
(null)

Reply via email to