Communications in Slurm are not only performed from controller to slurmd
and from slurmd to controller. You need to ensure that your login nodes can
reach the controller and the slurmd nodes as well as ensure that slurmd on
the various nodes can contact each other. This last requirement is because
of the tree logic used in slurm communication :

- to ensure scalability, slurmctld use a communication tree (see TreeWidth
in "man slurm.conf"), used for example to periodically check that all the
nodes are working properly
- the same exact logic is used by srun when it contacts the various slurmd
involved in its step
- reversed tree communications are performed among slurmds of steps at
their end to send accounting data and other stuff to the controller

- only some communications are point-to-point between slurmd and
controller, especially the "registering call" performed at slurmd startup.

When slurmd can not contact each other because of network failures
(partitioning) or too restrictive filtering, then you see the kind of
flapping that you have. This is because point-to-point communication at
slurmd registering make them appears to the controller, tree checks make
some of them dissapear, retries can lead to point to point communications
to some nodes when the amount of destination nodes contacted by the
controller at the same time is lower than the configured TreeWidth, thus
nodes suddenly reappear... until the next check... and so on.

Two options for you :

- be less restrictive in your filtering rules
- set TreeWidth to 1 in slurm.conf but you will loose the
performance/scalability of slurm internals communication

If your cluster is large, I would recommend to use the first one.


PS : you can look at that presentation for a few details on the
communication logic :

2018-05-17 22:21 GMT+02:00 Sean Caron <sca...@umich.edu>:

> Sorry, how do you mean? The environment is very basic. Compute nodes and
> SLURM controller are on an RFC1918 subnet. Gateways are dual homed with one
> leg on a public IP and one leg on the RFC1918 cluster network. It used to
> be that nodes that only had a leg on the RFC1918 network (compute nodes and
> the SLURM controller) had no firewall at all and nodes that were dual homed
> basically were set to just permit all traffic from the cluster side NIC
> (i.e. iptables rule like -A INPUT -i ethX -j ACCEPT).
> Now we're trying to go back to the gateways and compute nodes and actually
> codify, instead of just passing all traffic from the cluster side NIC, what
> ports and protocols are actually in use, or at least, what server-to-server
> communication is expected and normative, and then define a rule set to
> permit those while dropping other traffic not explicitly whitelisted.
> The compute and gateway nodes work fine with SLURM even when iptables is
> enabled and the policy is "permit all traffic from that NIC" but once we
> tighten it down just a little bit to "permit all traffic to and from the
> SLURM controller" we see these weird instances of node state flapping. It's
> not clear to me why this is the case since from the standpoint of node to
> controller communications, these policies are logically very similar, but
> there it is. The nodes shouldn't have to talk to anything else besides the
> SLURM controller for SLURM to work, so long as time is synched up between
> them and there are no issues with the nodes getting to slurm.conf.
> Best,
> Sean
> On Thu, May 17, 2018 at 1:21 PM, Patrick Goetz <pgo...@math.utexas.edu>
> wrote:
>> Does your SMS have a dedicated interface for node traffic?
>> On 05/16/2018 04:00 PM, Sean Caron wrote:
>>> I see some chatter on 6818/TCP from the compute node to the SLURM
>>> controller, and from the SLURM controller to the compute node.
>>> The policy is to permit all packets inbound from SLURM controller
>>> regardless of port and protocol, and perform no filtering whatsoever on any
>>> output packets to anywhere. I wouldn't expect this to interfere.
>>> Anyway, it's not that it NEVER works once the firewall is switched on.
>>> It's that it flaps. The firewall is clearly passing enough traffic to have
>>> the node marked as up some of the time. But why the periodic "not
>>> responding" ... "responding" cycles? Once it says "not responding" I can
>>> still scontrol ping from the compute node in question, and standard ICMP
>>> ping from one to the other works as well.
>>> Best,
>>> Sean
>>> On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <a...@calicolabs.com
>>> <mailto:a...@calicolabs.com>> wrote:
>>>     Add a logging rule to your iptables and look at what traffic is
>>>     actually being blocked?
>>>     On Wed, May 16, 2018 at 11:11 AM Sean Caron <sca...@umich.edu
>>>     <mailto:sca...@umich.edu>> wrote:
>>>         Hi all,
>>>         Does anyone use SLURM in a scenario where there is an iptables
>>>         firewall on the compute nodes on the same network it uses to
>>>         communicate with the SLURM controller and DBD machine?
>>>         I have the very basic situation where ...
>>>         1. There is no iptables firewall enabled at all on the SLURM
>>>         controller/DBD machine.
>>>         2. Compute nodes are set to permit all ports and protocols from
>>>         the SLURM controller with a rule like:
>>>         -A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT
>>>         If I enable this on the compute nodes, they flap up in down in
>>>         "Not responding state". If I switch off the firewall on the
>>>         compute nodes, they work fine.
>>>         When firewall is up on the compute nodes, SLURM controller can
>>>         ping compute nodes, no problem. I have no reason to believe all
>>>         ports and protocols are not being passed. Time is synched. No
>>>         trouble accessing slurm.conf on any of the clients.
>>>         Has anyone seen this before? There seems to be very little
>>>         information about SLURM's interactions with iptables. I know
>>>         this is kind of a funky scenario but regulatory requirements
>>>         have me needing to tighten down our cluster network a little
>>>         bit. Is this like a latency issue, or ...?
>>>         Thanks,
>>>         Sean

Reply via email to