how stupid I am, your perfectly right! How by hell was I unable to see that before I upgraded? I really need hollydays. Sorry for inconvenience.
Maybe the error message could be enhanced like this: This is slurm controler host, slurmd doesn't need to run on controller host except if you list it as a compute node as well (not recommanded). -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : Jacek Budzowski [j.budzow...@cyfronet.pl] Envoyé : jeudi 10 août 2017 14:56 À : slurm-dev Objet : [slurm-dev] Re: Slurmd v15 to v17 stopped working (slurmd: fatal: Unable to determine this slurmd's NodeName) on ControlMachine Hi, I think you shouldn't run slurmd on your ControlMachine node (but run slurmctld and slurmdbd), as in your configuration I don't see that slurm_master has its NodeName line. So you should either add slurm_master to your slurm.conf in NodeName line or not start slurmd on the slurm_master. Cheers, Jacek W dniu 10.08.2017 o 14:36, LAHAYE Olivier pisze: > Hi, > > I've upgraded slurm 15.08.3 (built from rpmbuild -tb <tarball>) to 17.02.6 on > centos-7-x86_64. > > Since I've done that, slurmd refuse to start on ControlMachine and on > Backupcontroller. (it starts fine on compute nodes) > > The error is: slurmd: fatal: Unable to determine this slurmd's NodeName > > If I try to specify the nodename it fails with a different error message: > > [root@slurm_master] # slurmd -D -N $(hostname -s) > slurmd: Node configuration differs from hardware: CPUs=0:32(hw) > Boards=0:1(hw) SocketsPerBoard=0:2(hw) CoresPerSocket=0:8(hw) > ThreadsPerCore=0:2(hw) > slurmd: Message aggregation disabled > slurmd: error: find_node_record: lookup failure for slurm_master > slurmd: fatal: ROUTE -- slurm_master not found in node_record_table > [root@slurm_master]# hostname -s > slurm_master > > Trying to debug seems to show that the hostname is not in the node hash table. > > slurmdbd and slurmctld start fine. > I've googled around, but I only find problems related to compute nodes, not > Controller or Backup. > > Any ideas? -- Jacek Budzowski System administrator ACC Cyfronet AGH