Thank you all for suggestions. I turned off firewall on both machines but still no luck. I can confirm that No managed switch is preventing the nodes from communicating. If you check the log file, there is communication for about 4mins and then the node state goes down. Any other idea? ________________________________ From: Ole Holm Nielsen <[email protected]> Sent: Wednesday, July 5, 2017 7:07:15 PM To: slurm-dev Subject: [slurm-dev] Re: SLURM ERROR! NEED HELP
On 07/05/2017 11:40 AM, Felix Willenborg wrote: > in my network I encountered that managed switches were preventing > necessary network communication between the nodes, on which SLURM > relies. You should check if you're using managed switches to connect > nodes to the network and if so, if they're blocking communication on > slurm ports. Managed switches should permit IP layer 2 traffic just like unmanaged switches! We only have managed Ethernet switches, and they work without problems. Perhaps you meant that Ethernet switches may perform some firewall functions by themselves? Firewalls must be off between Slurm compute nodes as well as the controller host. See https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons /Ole
