I'd like to have a single slurm instance schedule jobs onto two
physically disjoint clusters.  The compute nodes of one cluster cannot
reach the compute nodes of the other cluster, but they can all see the
scheduler nodes.  With slurm's hierarchical communication, when some
nodes can't reach others slurm thinks the nodes are not responding and
would eventually mark them offline.  Is there any way to logically group
nodes into separate communication groups to avoid this problem?

-JE


Reply via email to