Hi all; I am trying to run slurm with multiple slurmd daemons on a single node. I built slurm with "--enable-multiple-slurmd" parameter, I modified slurm.conf:
# COMPUTE NODES NodeName=node1 NodeHostname=claudioslurm Port=17001 CPUs=2 State=UNKNOWN NodeName=node2 NodeHostname=claudioslurm Port=17002 CPUs=2 State=UNKNOWN PartitionName=Cloud1 Nodes=node[1-2] Default=YES MaxTime=INFINITE State=UP and I started each slurmd: [root@claudioslurm log]# slurmd -N node1 [root@claudioslurm log]# slurmd -N node2 However, I am still not able to run a parallel job on it. The log files show that slurmctld doesn't know node1 and node2 and just after I run "slurmd -N node1", a slurmd is killed, and after I run "slurmd -N node2", the slurmd previously created (node1) shutdown. Could anyone, please, tell me what I am doing wrong? Thanks, Claudio slurmcrtl.log : .... [2016-03-16T12:09:22.381] error: Registration message from unknown node node1 [2016-03-16T12:09:22.381] error: _slurm_rpc_node_registration node=node1: Invalid node name specified [2016-03-16T12:10:45.830] error: Registration message from unknown node node2 [2016-03-16T12:10:45.830] error: _slurm_rpc_node_registration node=node2: Invalid node name specified [2016-03-16T12:17:14.721] error: Nodes claudioslurm not responding, setting DOWN slurmd.node1.log: [2016-03-16T12:09:22.375] Message aggregation disabled [2016-03-16T12:09:22.375] CPU frequency setting not configured for this node [2016-03-16T12:09:22.375] Resource spec: Reserved system memory limit not configured for this node [2016-03-16T12:09:22.377] slurmd version 15.08.7 started [2016-03-16T12:09:22.377] killing old slurmd[8656] [2016-03-16T12:09:22.379] slurmd started on Wed, 16 Mar 2016 12:09:22 +1100 [2016-03-16T12:09:22.380] CPUs=2 Boards=1 Sockets=2 Cores=1 Threads=1 Memory=5852 TmpDisk=30204 Uptime=4240700 CPUSpecList=(null) [2016-03-16T12:10:45.827] Slurmd shutdown completing slurmd.node2.log: [2016-03-16T12:10:45.824] Message aggregation disabled [2016-03-16T12:10:45.824] CPU frequency setting not configured for this node [2016-03-16T12:10:45.824] Resource spec: Reserved system memory limit not configured for this node [2016-03-16T12:10:45.826] slurmd version 15.08.7 started [2016-03-16T12:10:45.826] killing old slurmd[9352] [2016-03-16T12:10:45.828] slurmd started on Wed, 16 Mar 2016 12:10:45 +1100 [2016-03-16T12:10:45.829] CPUs=2 Boards=1 Sockets=2 Cores=1 Threads=1 Memory=5852 TmpDisk=30204 Uptime=4240783 CPUSpecList= (null)
