Does your SMS have a dedicated interface for node traffic?
On 05/16/2018 04:00 PM, Sean Caron wrote:
I see some chatter on 6818/TCP from the compute node to the SLURM
controller, and from the SLURM controller to the compute node.
The policy is to permit all packets inbound from SLURM controller
regardless of port and protocol, and perform no filtering whatsoever on
any output packets to anywhere. I wouldn't expect this to interfere.
Anyway, it's not that it NEVER works once the firewall is switched on.
It's that it flaps. The firewall is clearly passing enough traffic to
have the node marked as up some of the time. But why the periodic "not
responding" ... "responding" cycles? Once it says "not responding" I can
still scontrol ping from the compute node in question, and standard ICMP
ping from one to the other works as well.
On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <a...@calicolabs.com
Add a logging rule to your iptables and look at what traffic is
actually being blocked?
On Wed, May 16, 2018 at 11:11 AM Sean Caron <sca...@umich.edu
Does anyone use SLURM in a scenario where there is an iptables
firewall on the compute nodes on the same network it uses to
communicate with the SLURM controller and DBD machine?
I have the very basic situation where ...
1. There is no iptables firewall enabled at all on the SLURM
2. Compute nodes are set to permit all ports and protocols from
the SLURM controller with a rule like:
-A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT
If I enable this on the compute nodes, they flap up in down in
"Not responding state". If I switch off the firewall on the
compute nodes, they work fine.
When firewall is up on the compute nodes, SLURM controller can
ping compute nodes, no problem. I have no reason to believe all
ports and protocols are not being passed. Time is synched. No
trouble accessing slurm.conf on any of the clients.
Has anyone seen this before? There seems to be very little
information about SLURM's interactions with iptables. I know
this is kind of a funky scenario but regulatory requirements
have me needing to tighten down our cluster network a little
bit. Is this like a latency issue, or ...?