[slurm-dev] Re: Confusion regarding single partition with separate groups of nodes

jette Tue, 03 Feb 2015 12:28:53 -0800

I believe that is OK, but don't know off the top of my head. Eithertest for youself or

1. add the new partition
2  move pending jobs to the new partition
3. Delete the other partitions once their jobs end


Quoting John Desantis <[email protected]>:

Moe,

One last question when you have a chance.

If there are running jobs with active partitions now and we switch to
one partition with a topology, would those running jobs be lost?

Thank you!
John DeSantis

2015-02-03 12:42 GMT-05:00  <[email protected]>:


You can configure a single queue and use the topology/tree plugin to
identify the nodes on separate fabrics.


Quoting John Desantis <[email protected]>:


Hello all,

Unfortunately, I have some confusion regarding how to achieve a global
and single partition for our users with several separate host groups
after reading the man pages and various documentation.

When I say host groups, I mean separate sets of hardware which utilize
different infiniband fabrics and/or are accessible in different data
centers, different CPU architectures, etc.

During initial testing periods, I was able to have use of a default
partition with all of the nodes allocated via the "Nodes="  value.
All was well until a latter set of nodes were added which had a
separate infiniband fabric.  Testing proved that applications were
attempting to utilize the nodes within the separate fabrics, which
failed miserably, and as a result we're using separate partitions -
which most users don't mind.

Now that we're getting more users converted to Slurm, we're realizing
that some users don't know how to check for free partitions and
available hardware (boo!) and have grown used to our previous
scheduler configuration of 1 global queue.

I'm looking into how to emulate this and I'm not quite clear if this
can be done using multiple partition definitions with a DEFAULT clause
or not.  I've looked at the topology/tree plugin as well and seeing
that you can specify either switches or nodes, if this would be the
preferred method to achieve 1 "global" partition which utilizes all of
the separate hardware pools and respects the separate host groups.

Thank you,
John DeSantis




--
Morris "Moe" Jette
CTO, SchedMD LLC
Commercial Slurm Development and Support



--
Morris "Moe" Jette
CTO, SchedMD LLC
Commercial Slurm Development and Support

[slurm-dev] Re: Confusion regarding single partition with separate groups of nodes

Reply via email to