Thank you all for suggestions. I turned off firewall on both machines but still 
no luck. I can confirm that No managed switch is preventing the nodes from 
communicating. If you check the log file, there is communication for about 
4mins and then the node state goes down.
Any other idea?
________________________________
From: Ole Holm Nielsen <[email protected]>
Sent: Wednesday, July 5, 2017 7:07:15 PM
To: slurm-dev
Subject: [slurm-dev] Re: SLURM ERROR! NEED HELP


On 07/05/2017 11:40 AM, Felix Willenborg wrote:
> in my network I encountered that managed switches were preventing
> necessary network communication between the nodes, on which SLURM
> relies. You should check if you're using managed switches to connect
> nodes to the network and if so, if they're blocking communication on
> slurm ports.

Managed switches should permit IP layer 2 traffic just like unmanaged
switches!  We only have managed Ethernet switches, and they work without
problems.

Perhaps you meant that Ethernet switches may perform some firewall
functions by themselves?

Firewalls must be off between Slurm compute nodes as well as the
controller host.  See
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons

/Ole

Reply via email to