Moe,

One last question when you have a chance.

If there are running jobs with active partitions now and we switch to
one partition with a topology, would those running jobs be lost?

Thank you!
John DeSantis

2015-02-03 12:42 GMT-05:00  <[email protected]>:
>
> You can configure a single queue and use the topology/tree plugin to
> identify the nodes on separate fabrics.
>
>
> Quoting John Desantis <[email protected]>:
>>
>> Hello all,
>>
>> Unfortunately, I have some confusion regarding how to achieve a global
>> and single partition for our users with several separate host groups
>> after reading the man pages and various documentation.
>>
>> When I say host groups, I mean separate sets of hardware which utilize
>> different infiniband fabrics and/or are accessible in different data
>> centers, different CPU architectures, etc.
>>
>> During initial testing periods, I was able to have use of a default
>> partition with all of the nodes allocated via the "Nodes="  value.
>> All was well until a latter set of nodes were added which had a
>> separate infiniband fabric.  Testing proved that applications were
>> attempting to utilize the nodes within the separate fabrics, which
>> failed miserably, and as a result we're using separate partitions -
>> which most users don't mind.
>>
>> Now that we're getting more users converted to Slurm, we're realizing
>> that some users don't know how to check for free partitions and
>> available hardware (boo!) and have grown used to our previous
>> scheduler configuration of 1 global queue.
>>
>> I'm looking into how to emulate this and I'm not quite clear if this
>> can be done using multiple partition definitions with a DEFAULT clause
>> or not.  I've looked at the topology/tree plugin as well and seeing
>> that you can specify either switches or nodes, if this would be the
>> preferred method to achieve 1 "global" partition which utilizes all of
>> the separate hardware pools and respects the separate host groups.
>>
>> Thank you,
>> John DeSantis
>
>
>
> --
> Morris "Moe" Jette
> CTO, SchedMD LLC
> Commercial Slurm Development and Support

Reply via email to