Re: [slurm-users] How should I configure a node with Autodetect=nvml?

2020-02-10 Thread Chris Samuel
On Monday, 10 February 2020 12:11:30 PM PST Dean Schulze wrote: > With this configuration I get this message every second in my slurmctld.log > file: > > error: _slurm_rpc_node_registration node=slurmnode1: Invalid argument What other errors are in the logs? Could you check that you've got

[slurm-users] How should I configure a node with Autodetect=nvml?

2020-02-10 Thread Dean Schulze
In the gres.conf on one of my nodes I have just the line Autodetect=nvml as in the last example in https://slurm.schedmd.com/gres.conf.html. In the slurm.conf on all nodes I have this line for the node with Autodetect=nvml NodeName=slurmnode1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerS

Re: [slurm-users] Node appears to have a different slurm.conf than the slurmctld; update_node: node reason set to: Kill task failed

2020-02-10 Thread Brian Andrus
Usually means you updated the slurm.conf but have not done "scontrol reconfigure" yet. Brian Andrus On 2/10/2020 8:55 AM, Robert Kudyba wrote: We are using Bright Cluster 8.1 with and just upgraded to slurm-17.11.12. We're getting the below errors when I restart the slurmctld service. The f

[slurm-users] Node appears to have a different slurm.conf than the slurmctld; update_node: node reason set to: Kill task failed

2020-02-10 Thread Robert Kudyba
We are using Bright Cluster 8.1 with and just upgraded to slurm-17.11.12. We're getting the below errors when I restart the slurmctld service. The file appears to be the same on the head node and compute nodes: [root@node001 ~]# ls -l /cm/shared/apps/slurm/var/etc/slurm.conf -rw-r--r-- 1 root roo

Re: [slurm-users] Which ports does slurm use?

2020-02-10 Thread Ole Holm Nielsen
Hi Dean, Blocking ports with the Linux firewall and/or your network firewall (wired/Wi-Fi) would have the same effect: Slurm won't work unless you open ports as specified in https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons /Ole On 2/8/20 1:26 AM,