In the process of developing our new cluster using Slurm, I've been bitten by the firewall settings on the compute nodes preventing MPI jobs from spawning tasks on remote nodes.
I now believe that Slurm actually has a requirement that compute nodes must have their Linux firewall disabled. I haven't been able to find any hint of this requirement in the official Slurm documentation. I did find an old slurm-devel posting by Moe Jette (pretty authoritative!) in 2010
https://groups.google.com/forum/#!topic/slurm-devel/wOHcXopbaXw saying:
Other communications (say between srun and the spawned tasks) are intended to operate within a cluster and have no port restrictions. If there is a firewall between nodes in your cluster (at least as a "cluster" is configured in SLURM), then logic would need to be added to SLURM to provide the functionality you describe.
Can anyone confirm that Moe's statement is still valid with the current Slurm version?
Conclusion: Compute nodes must have their Linux firewall disabled. Thanks, Ole