Hi Chris,

Unfortunately, it isn't sufficient to open for Slurm port 6818 (I had already done that). When tasks are started from the job's master node on slave nodes, unknown ports will be used by srun, so you must open all ports in your Linux firewall to all other compute nodes in the cluster.

It would be nice if someone could document which TCP port ranges are actually required to be opened in the firewall. Might it just be the ephemeral ports 49152 to 65535, for example?

Thanks,
Ole

On 10/27/2016 05:13 PM, Christopher Benjamin Coffey wrote:
Hi Ole,

I don’t see a reason for a firewall to exist on a compute node, is it a 
requirement on your new cluster?  If not, disable it.  I don’t see Moe’s 
statement as saying that you can’t have a firewall, just that if there is one, 
you should open it up to allow all slurm communication.

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167

On 10/27/16, 5:58 AM, "Ole Holm Nielsen" <ole.h.niel...@fysik.dtu.dk> wrote:


    In the process of developing our new cluster using Slurm, I've been
    bitten by the firewall settings on the compute nodes preventing MPI jobs
    from spawning tasks on remote nodes.

    I now believe that Slurm actually has a requirement that compute nodes
    must have their Linux firewall disabled.  I haven't been able to find
    any hint of this requirement in the official Slurm documentation.  I did
    find an old slurm-devel posting by Moe Jette (pretty authoritative!) in 2010
       https://groups.google.com/forum/#!topic/slurm-devel/wOHcXopbaXw
    saying:

    > Other communications (say between srun and the spawned tasks) are 
intended to operate within a cluster
    > and have no port restrictions. If there is a firewall between nodes in your cluster 
(at least as a "cluster" is
    > configured in SLURM), then logic would need to be added to SLURM to 
provide the functionality you describe.

    Can anyone confirm that Moe's statement is still valid with the current
    Slurm version?

    Conclusion: Compute nodes must have their Linux firewall disabled.

Reply via email to