Does your SMS have a dedicated interface for node traffic?

On 05/16/2018 04:00 PM, Sean Caron wrote:
I see some chatter on 6818/TCP from the compute node to the SLURM controller, and from the SLURM controller to the compute node.

The policy is to permit all packets inbound from SLURM controller regardless of port and protocol, and perform no filtering whatsoever on any output packets to anywhere. I wouldn't expect this to interfere.

Anyway, it's not that it NEVER works once the firewall is switched on. It's that it flaps. The firewall is clearly passing enough traffic to have the node marked as up some of the time. But why the periodic "not responding" ... "responding" cycles? Once it says "not responding" I can still scontrol ping from the compute node in question, and standard ICMP ping from one to the other works as well.

Best,

Sean


On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <a...@calicolabs.com <mailto:a...@calicolabs.com>> wrote:

    Add a logging rule to your iptables and look at what traffic is
    actually being blocked?

    On Wed, May 16, 2018 at 11:11 AM Sean Caron <sca...@umich.edu
    <mailto:sca...@umich.edu>> wrote:

        Hi all,

        Does anyone use SLURM in a scenario where there is an iptables
        firewall on the compute nodes on the same network it uses to
        communicate with the SLURM controller and DBD machine?

        I have the very basic situation where ...

        1. There is no iptables firewall enabled at all on the SLURM
        controller/DBD machine.

        2. Compute nodes are set to permit all ports and protocols from
        the SLURM controller with a rule like:

        -A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT

        If I enable this on the compute nodes, they flap up in down in
        "Not responding state". If I switch off the firewall on the
        compute nodes, they work fine.

        When firewall is up on the compute nodes, SLURM controller can
        ping compute nodes, no problem. I have no reason to believe all
        ports and protocols are not being passed. Time is synched. No
        trouble accessing slurm.conf on any of the clients.

        Has anyone seen this before? There seems to be very little
        information about SLURM's interactions with iptables. I know
        this is kind of a funky scenario but regulatory requirements
        have me needing to tighten down our cluster network a little
        bit. Is this like a latency issue, or ...?

        Thanks,

        Sean



Reply via email to