Awesome tip. Thanks so much, Matthieu. I hadn't considered that. I will give that a shot and see what happens.
Best, Sean On Thu, May 17, 2018 at 4:49 PM, Matthieu Hautreux < matthieu.hautr...@gmail.com> wrote: > Hi, > > Communications in Slurm are not only performed from controller to slurmd > and from slurmd to controller. You need to ensure that your login nodes can > reach the controller and the slurmd nodes as well as ensure that slurmd on > the various nodes can contact each other. This last requirement is because > of the tree logic used in slurm communication : > > - to ensure scalability, slurmctld use a communication tree (see TreeWidth > in "man slurm.conf"), used for example to periodically check that all the > nodes are working properly > - the same exact logic is used by srun when it contacts the various slurmd > involved in its step > - reversed tree communications are performed among slurmds of steps at > their end to send accounting data and other stuff to the controller > > - only some communications are point-to-point between slurmd and > controller, especially the "registering call" performed at slurmd startup. > > When slurmd can not contact each other because of network failures > (partitioning) or too restrictive filtering, then you see the kind of > flapping that you have. This is because point-to-point communication at > slurmd registering make them appears to the controller, tree checks make > some of them dissapear, retries can lead to point to point communications > to some nodes when the amount of destination nodes contacted by the > controller at the same time is lower than the configured TreeWidth, thus > nodes suddenly reappear... until the next check... and so on. > > Two options for you : > > - be less restrictive in your filtering rules > - set TreeWidth to 1 in slurm.conf but you will loose the > performance/scalability of slurm internals communication > > If your cluster is large, I would recommend to use the first one. > > HTH > Matthieu > > PS : you can look at that presentation for a few details on the > communication logic : > https://slurm.schedmd.com/SUG14/message_aggregation.pdf > > > > 2018-05-17 22:21 GMT+02:00 Sean Caron <sca...@umich.edu>: > >> Sorry, how do you mean? The environment is very basic. Compute nodes and >> SLURM controller are on an RFC1918 subnet. Gateways are dual homed with one >> leg on a public IP and one leg on the RFC1918 cluster network. It used to >> be that nodes that only had a leg on the RFC1918 network (compute nodes and >> the SLURM controller) had no firewall at all and nodes that were dual homed >> basically were set to just permit all traffic from the cluster side NIC >> (i.e. iptables rule like -A INPUT -i ethX -j ACCEPT). >> >> Now we're trying to go back to the gateways and compute nodes and >> actually codify, instead of just passing all traffic from the cluster side >> NIC, what ports and protocols are actually in use, or at least, what >> server-to-server communication is expected and normative, and then define a >> rule set to permit those while dropping other traffic not explicitly >> whitelisted. >> >> The compute and gateway nodes work fine with SLURM even when iptables is >> enabled and the policy is "permit all traffic from that NIC" but once we >> tighten it down just a little bit to "permit all traffic to and from the >> SLURM controller" we see these weird instances of node state flapping. It's >> not clear to me why this is the case since from the standpoint of node to >> controller communications, these policies are logically very similar, but >> there it is. The nodes shouldn't have to talk to anything else besides the >> SLURM controller for SLURM to work, so long as time is synched up between >> them and there are no issues with the nodes getting to slurm.conf. >> >> Best, >> >> Sean >> >> >> On Thu, May 17, 2018 at 1:21 PM, Patrick Goetz <pgo...@math.utexas.edu> >> wrote: >> >>> Does your SMS have a dedicated interface for node traffic? >>> >>> On 05/16/2018 04:00 PM, Sean Caron wrote: >>> >>>> I see some chatter on 6818/TCP from the compute node to the SLURM >>>> controller, and from the SLURM controller to the compute node. >>>> >>>> The policy is to permit all packets inbound from SLURM controller >>>> regardless of port and protocol, and perform no filtering whatsoever on any >>>> output packets to anywhere. I wouldn't expect this to interfere. >>>> >>>> Anyway, it's not that it NEVER works once the firewall is switched on. >>>> It's that it flaps. The firewall is clearly passing enough traffic to have >>>> the node marked as up some of the time. But why the periodic "not >>>> responding" ... "responding" cycles? Once it says "not responding" I can >>>> still scontrol ping from the compute node in question, and standard ICMP >>>> ping from one to the other works as well. >>>> >>>> Best, >>>> >>>> Sean >>>> >>>> >>>> On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <a...@calicolabs.com >>>> <mailto:a...@calicolabs.com>> wrote: >>>> >>>> Add a logging rule to your iptables and look at what traffic is >>>> actually being blocked? >>>> >>>> On Wed, May 16, 2018 at 11:11 AM Sean Caron <sca...@umich.edu >>>> <mailto:sca...@umich.edu>> wrote: >>>> >>>> Hi all, >>>> >>>> Does anyone use SLURM in a scenario where there is an iptables >>>> firewall on the compute nodes on the same network it uses to >>>> communicate with the SLURM controller and DBD machine? >>>> >>>> I have the very basic situation where ... >>>> >>>> 1. There is no iptables firewall enabled at all on the SLURM >>>> controller/DBD machine. >>>> >>>> 2. Compute nodes are set to permit all ports and protocols from >>>> the SLURM controller with a rule like: >>>> >>>> -A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT >>>> >>>> If I enable this on the compute nodes, they flap up in down in >>>> "Not responding state". If I switch off the firewall on the >>>> compute nodes, they work fine. >>>> >>>> When firewall is up on the compute nodes, SLURM controller can >>>> ping compute nodes, no problem. I have no reason to believe all >>>> ports and protocols are not being passed. Time is synched. No >>>> trouble accessing slurm.conf on any of the clients. >>>> >>>> Has anyone seen this before? There seems to be very little >>>> information about SLURM's interactions with iptables. I know >>>> this is kind of a funky scenario but regulatory requirements >>>> have me needing to tighten down our cluster network a little >>>> bit. Is this like a latency issue, or ...? >>>> >>>> Thanks, >>>> >>>> Sean >>>> >>>> >>>> >>> >> >